Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey

Research output: Contribution to journalArticlepeer-review

198 Scopus citations

Abstract

This research synthesizes a taxonomy for classifying detection methods of new malicious code by Machine Learning (ML) methods based on static features extracted from executables. The taxonomy is then operationalized to classify research on this topic and pinpoint critical open research issues in light of emerging threats. The article addresses various facets of the detection challenge, including: file representation and feature selection methods, classification algorithms, weighting ensembles, as well as the imbalance problem, active learning, and chronological evaluation. From the survey we conclude that a framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious). The framework should include training of multiple classifiers on various types of features (mainly OpCode and byte n-grams and Portable Executable Features), applying weighting algorithm on the classification results of the individual classifiers, as well as an active learning mechanism to maintain high detection accuracy. The training of classifiers should also consider the imbalance problem by generating classifiers that will perform accurately in a real-life situation where the percentage of malicious files among all files is estimated to be approximately 10%.

Original languageEnglish
Pages (from-to)16-29
Number of pages14
JournalInformation Security Technical Report
Volume14
Issue number1
DOIs
StatePublished - 1 Feb 2009

Fingerprint

Dive into the research topics of 'Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey'. Together they form a unique fingerprint.

Cite this