TY - GEN
T1 - Malicious code detection using active learning
AU - Moskovitch, Robert
AU - Nissim, Nir
AU - Elovici, Yuval
PY - 2009/7/16
Y1 - 2009/7/16
N2 - The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic and other malicious purposes. Currently, dozens of new malicious codes are created every day and this number is expected to increase in the coming years. Today's signature-based anti-viruses and heuristic-based methods are accurate, but cannot detect new malicious code. Recently, classification algorithms were used successfully for the detection of malicious code. We present a complete methodology for the detection of unknown malicious code, inspired by text categorization concepts. However, this approach can be exploited further to achieve a more accurate and efficient acquisition method of unknown malicious files.We use an Active-Learning framework that enables the selection of the unknown files for fast acquisition. We performed an extensive evaluation of a test collection consisting of more than 30,000 files. We present a rigorous evaluation setup, consisting of real-life scenarios, in which the malicious file content is expected to be low, at about 10% of the files in the stream. We define specific evaluation measures based on the known precision and recall measures, which show the accuracy of the acquisition process and the improvement in the classifier resulting from the efficient acquisition process.
AB - The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic and other malicious purposes. Currently, dozens of new malicious codes are created every day and this number is expected to increase in the coming years. Today's signature-based anti-viruses and heuristic-based methods are accurate, but cannot detect new malicious code. Recently, classification algorithms were used successfully for the detection of malicious code. We present a complete methodology for the detection of unknown malicious code, inspired by text categorization concepts. However, this approach can be exploited further to achieve a more accurate and efficient acquisition method of unknown malicious files.We use an Active-Learning framework that enables the selection of the unknown files for fast acquisition. We performed an extensive evaluation of a test collection consisting of more than 30,000 files. We present a rigorous evaluation setup, consisting of real-life scenarios, in which the malicious file content is expected to be low, at about 10% of the files in the stream. We define specific evaluation measures based on the known precision and recall measures, which show the accuracy of the acquisition process and the improvement in the classifier resulting from the efficient acquisition process.
UR - http://www.scopus.com/inward/record.url?scp=67650251631&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-01718-6_6
DO - 10.1007/978-3-642-01718-6_6
M3 - Conference contribution
AN - SCOPUS:67650251631
SN - 9783642017179
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 74
EP - 91
BT - Privacy, Security, and Trust in KDD - Second ACM SIGKDD International Workshop, PinKDD 2008, Revised Selected Papers
T2 - 2nd ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD, PinKDD 2008
Y2 - 24 August 2008 through 27 August 2008
ER -