ALPD: Active learning framework for enhancing the detection of malicious PDF files

Nir Nissim, Aviad Cohen, Robert Moskovitch, Assaf Shabtai, Mattan Edry, Oren Bar-Ad, Yuval Elovici

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

Email communication carrying malicious attachments or links is often used as an attack vector for initial penetration of the targeted organization. Existing defense solutions prevent executables from entering organizational networks via emails, therefore recent attacks tend to use non-executable files such as PDF. Machine learning algorithms have recently been applied for detecting malicious PDF files. These techniques, however, lack an essential element-they cannot be updated daily. In this study we present ALPD, a framework that is based on active learning methods that are specially designed to efficiently assist anti-virus vendors to focus their analytical efforts. This is done by identifying and acquiring new PDF files that are most likely malicious, as well as informative benign PDF documents. These files are used for retraining and enhancing the knowledge stores. Evaluation results show that in the final day of the experiment, Combination, one of our AL methods, outperformed all the others, enriching the anti-virus's signature repository with almost seven times more new PDF malware while also improving the detection model's performance on a daily basis.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE Joint Intelligence and Security Informatics Conference, JISIC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages91-98
Number of pages8
ISBN (Electronic)9781479963645
DOIs
StatePublished - 4 Dec 2014
Event2014 IEEE Joint Intelligence and Security Informatics Conference, JISIC 2014 - The Hague, Netherlands
Duration: 24 Sep 201426 Sep 2014

Publication series

NameProceedings - 2014 IEEE Joint Intelligence and Security Informatics Conference, JISIC 2014

Conference

Conference2014 IEEE Joint Intelligence and Security Informatics Conference, JISIC 2014
Country/TerritoryNetherlands
CityThe Hague
Period24/09/1426/09/14

Keywords

  • Active Learning
  • Machine Learning
  • Malware
  • PDF

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'ALPD: Active learning framework for enhancing the detection of malicious PDF files'. Together they form a unique fingerprint.

Cite this