TY - GEN
T1 - Deep learning for classification of malware system call sequences
AU - Kolosnjaji, Bojan
AU - Zarras, Apostolis
AU - Webster, George
AU - Eckert, Claudia
N1 - Publisher Copyright:
© Springer International Publishing AG 2016.
PY - 2016
Y1 - 2016
N2 - The increase in number and variety of malware samples amplifies the need for improvement in automatic detection and classification of the malware variants. Machine learning is a natural choice to cope with this increase, because it addresses the need of discovering underlying patterns in large-scale datasets. Nowadays, neural network methodology has been grown to the state that can surpass limitations of previous machine learning methods, such as Hidden Markov Models and Support Vector Machines. As a consequence, neural networks can now offer superior classification accuracy in many domains, such as computer vision or natural language processing. This improvement comes from the possibility of constructing neural networks with a higher number of potentially diverse layers and is known as Deep Learning. In this paper, we attempt to transfer these performance improvements to model the malware system call sequences for the purpose of malware classification. We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture.
AB - The increase in number and variety of malware samples amplifies the need for improvement in automatic detection and classification of the malware variants. Machine learning is a natural choice to cope with this increase, because it addresses the need of discovering underlying patterns in large-scale datasets. Nowadays, neural network methodology has been grown to the state that can surpass limitations of previous machine learning methods, such as Hidden Markov Models and Support Vector Machines. As a consequence, neural networks can now offer superior classification accuracy in many domains, such as computer vision or natural language processing. This improvement comes from the possibility of constructing neural networks with a higher number of potentially diverse layers and is known as Deep Learning. In this paper, we attempt to transfer these performance improvements to model the malware system call sequences for the purpose of malware classification. We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture.
UR - http://www.scopus.com/inward/record.url?scp=85007154533&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-50127-7_11
DO - 10.1007/978-3-319-50127-7_11
M3 - Conference contribution
AN - SCOPUS:85007154533
SN - 9783319501260
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 137
EP - 149
BT - AI 2016
A2 - Kang, Byeong Ho
A2 - Bai, Quan
PB - Springer Verlag
T2 - 29th Australasian Joint Conference on Artificial Intelligence, AI 2016
Y2 - 5 December 2016 through 8 December 2016
ER -