Novel active learning methods for enhanced PC malware detection in windows OS

Research output: Contribution to journalArticlepeer-review

88 Scopus citations


The formation of new malwares every day poses a significant challenge to anti-virus vendors since antivirus tools, using manually crafted signatures, are only capable of identifying known malware instances and their relatively similar variants. To identify new and unknown malwares for updating their anti-virus signature repository, anti-virus vendors must daily collect new, suspicious files that need to be analyzed manually by information security experts who then label them as malware or benign. Analyzing suspected files is a time-consuming task and it is impossible to manually analyze all of them. Consequently, anti-virus vendors use machine learning algorithms and heuristics in order to reduce the number of suspect files that must be inspected manually. These techniques, however, lack an essential element - they cannot be daily updated. In this work we introduce a solution for this updatability gap. We present an active learning (AL) framework and introduce two new AL methods that will assist anti-virus vendors to focus their analytical efforts by acquiring those files that are most probably malicious. Those new AL methods are designed and oriented towards new malware acquisition. To test the capability of our methods for acquiring new malwares from a stream of unknown files, we conducted a series of experiments over a ten-day period. A comparison of our methods to existing high performance AL methods and to random selection, which is the naïve method, indicates that the AL methods outperformed random selection for all performance measures. Our AL methods outperformed existing AL method in two respects, both related to the number of new malwares acquired daily, the core measure in this study. First, our best performing AL method, termed "Exploitation", acquired on the 9th day of the experiment about 2.6 times more malwares than the existing AL method and 7.8 more times than the random selection. Secondly, while the existing AL method showed a decrease in the number of new malwares acquired over 10 days, our AL methods showed an increase and a daily improvement in the number of new malwares acquired. Both results point towards increased efficiency that can possibly assist anti-virus vendors.

Original languageEnglish
Pages (from-to)5843-5857
Number of pages15
JournalExpert Systems with Applications
Issue number13
StatePublished - 1 Oct 2014


  • Active learning
  • Machine Learning
  • Malicious code
  • Malware
  • SVM

ASJC Scopus subject areas

  • Engineering (all)
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Novel active learning methods for enhanced PC malware detection in windows OS'. Together they form a unique fingerprint.

Cite this