Unknown malcode detection using OPCODE representation

Robert Moskovitch, Clint Feher, Nir Tzachar, Eugene Berger, Marina Gitelman, Shlomi Dolev, Yuval Elovici

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

153 Scopus citations

Abstract

The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. Today's signature-based anti-viruses are very accurate, but cannot detect new malicious code. Recently, classification algorithms were employed successfully for the detection of unknown malicious code. However, most of the studies use byte sequence n-grams representation of the binary code of the executables. We propose the use of (Operation Code) OpCodes, generated by disassembling the executables. We then use n-grams of the OpCodes as features for the classification process. We present a full methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation of a test collection of more than 30,000 files, in which we evaluated extensively the OpCode n-gram representation and investigated the imbalance problem, referring to real-life scenarios, in which the malicious file content is expected to be about 10% of the total files. Our results indicate that greater than 99% accuracy can be achieved through the use of a training set that has a malicious file percentage lower than 15%, which is higher than in our previous experience with byte sequence n-gram representation [1].

Original languageEnglish
Title of host publicationIntelligence and Security Informatics - First European Conference, EuroISI 2008, Proceedings
Pages204-215
Number of pages12
DOIs
StatePublished - 1 Dec 2008
Event1st European Conference on Intelligence and Security Informatics, EuroISI 2008 - Esbjerg, Denmark
Duration: 3 Dec 20085 Dec 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5376 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st European Conference on Intelligence and Security Informatics, EuroISI 2008
Country/TerritoryDenmark
CityEsbjerg
Period3/12/085/12/08

Keywords

  • Classification
  • Malicious code detection
  • OpCode

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Unknown malcode detection using OPCODE representation'. Together they form a unique fingerprint.

Cite this