TY - GEN
T1 - Adaptive semantics-aware malware classification
AU - Kolosnjaji, Bojan
AU - Zarras, Apostolis
AU - Lengyel, Tamas
AU - Webster, George
AU - Eckert, Claudia
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - Automatic malware classification is an essential improvement over the widely-deployed detection procedures using manual signatures or heuristics. Although there exists an abundance of methods for collecting static and behavioral malware data, there is a lack of adequate tools for analysis based on these collected features. Machine learning is a statistical solution to the automatic classification of malware variants based on heterogeneous information gathered by investigating malware code and behavioral traces. However, the recent increase in variety of malware instances requires further development of effective and scalable automation for malware classification and analysis processes. In this paper, we investigate the topic modeling approaches as semantics-aware solutions to the classification of malware based on logs from dynamic malware analysis. We combine results of static and dynamic analysis to increase the reliability of inferred class labels. We utilize a semi-supervised learning architecture to make use of unlabeled data in classification. Using a nonparametric machine learning approach to topic modeling we design and implement a scalable solution while maintaining advantages of semantics-aware analysis. The outcomes of our experiments reveal that our approach brings a new and improved solution to the reoccurring problems in malware classification and analysis.
AB - Automatic malware classification is an essential improvement over the widely-deployed detection procedures using manual signatures or heuristics. Although there exists an abundance of methods for collecting static and behavioral malware data, there is a lack of adequate tools for analysis based on these collected features. Machine learning is a statistical solution to the automatic classification of malware variants based on heterogeneous information gathered by investigating malware code and behavioral traces. However, the recent increase in variety of malware instances requires further development of effective and scalable automation for malware classification and analysis processes. In this paper, we investigate the topic modeling approaches as semantics-aware solutions to the classification of malware based on logs from dynamic malware analysis. We combine results of static and dynamic analysis to increase the reliability of inferred class labels. We utilize a semi-supervised learning architecture to make use of unlabeled data in classification. Using a nonparametric machine learning approach to topic modeling we design and implement a scalable solution while maintaining advantages of semantics-aware analysis. The outcomes of our experiments reveal that our approach brings a new and improved solution to the reoccurring problems in malware classification and analysis.
UR - http://www.scopus.com/inward/record.url?scp=84979285094&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-40667-1_21
DO - 10.1007/978-3-319-40667-1_21
M3 - Conference contribution
AN - SCOPUS:84979285094
SN - 9783319406664
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 419
EP - 439
BT - Detection of Intrusions and Malware, and Vulnerability Assessment - 13th International Conference, DIMVA 2016, Proceedings
A2 - Zurutuza, Urko
A2 - Rodríguez, Ricardo J.
A2 - Caballero, Juan
PB - Springer Verlag
T2 - 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA 2016
Y2 - 7 July 2016 through 8 July 2016
ER -