A time-interval-based active learning framework for enhanced PE malware acquisition and detection

Ido Finder, Eitam Sheetrit, Nir Nissim

Research output: Contribution to journalArticlepeer-review

Abstract

Malware increasingly threatens users around the world on a variety of cybernetic platforms, resulting in damages of billions of dollars each year. In recent years, in order to improve the detection capabilities of widely used antivirus (AV) tools, machine learning (ML) algorithms and dynamic malware analysis have been leveraged for the extraction and learning of rich multivariate time-series data (MTSD) associated with behavioral information. Such MTSD can be exploited using a time-interval temporal pattern (TP) mining approach, however this approach has not been widely explored for the task of malware detection. The use of TPs enables the discovery of complex temporal relations between different variables, improves the ability to cope with missing values and noisy data, and provides explainability. In light of the continuous creation of new unknown malware on a daily basis, detection mechanisms require frequent updating to keep pace with the changing reality. Active learning (AL) can address the updatability gap by efficiently selecting and acquiring a small yet informative set of new samples while reducing the labeling efforts of experts; AL also provides maximal improvement of machine-learning-based detection models, which can further contribute to the updatability of antimalware tools. However, the use of AL methods for the acquisition of time-interval TP-based samples has yet to be explored. In this paper, we present novel AL methods and a detection framework for improved malware detection based on dynamic analysis, time-interval TPs, and ML algorithms. The proposed framework is capable of both prioritizing the acquisition of malicious samples and improving the malware detection capabilities of ML classifiers and antimalware tools. Our proposed framework was evaluated in an extensive set of experiments on a comprehensive data collection of 9,328 portable executables (5,000 benign and 4,328 malicious) that were executed in the Windows 10 environment. The results demonstrated our AL methods’ ability to prioritize the acquisition of malware and managed to acquire up to 93.5% of the malicious files each day, allowing frequent updating of antimalware tools. In addition, our framework was shown to be effective in improving the detection capabilities of several ML classifiers over time, with the best results (AUC of 95.15%) achieved by the SVM classifier. Our framework also showed that TPs can be used to identify emerging trends in malicious behavior.

Original languageEnglish
Article number102838
JournalComputers and Security
Volume121
DOIs
StatePublished - 1 Oct 2022

Keywords

  • Active learning
  • Detection
  • Dynamic analysis
  • Malware
  • Time-series

ASJC Scopus subject areas

  • Computer Science (all)
  • Law

Fingerprint

Dive into the research topics of 'A time-interval-based active learning framework for enhanced PE malware acquisition and detection'. Together they form a unique fingerprint.

Cite this