Code type revealing using experiments framework

Rami Sharon, Ehud Gudes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Identifying the type of a code, whether in a file or byte stream, is a challenge that many software companies are facing. Many applications, security and others, base their behavior on the type of code they receive as an input. Today's traditional identification methods rely on file extensions, magic numbers, propriety headers and trailers or specific type identifying rules. All these are vulnerable to content tampering and discovering it requires investing long and tedious working hours of professionals. This study is aimed to find a method of identifying the best settings to automatically create type signatures that will effectively overcome the content manipulation problem. In this paper we lay out a framework for creating type signatures based on byte N-Grams. The framework allows setting various parameters such as NGram sizes and windows, selecting statistical tests and defining rules for score calculations. The framework serves as a test lab that allows finding the right parameters to satisfy a predefined threshold of type identification accuracy. We demonstrate the framework using basic settings that achieved an F-Measure success rate of 0.996 on 1400 test files.

Original languageEnglish
Title of host publicationData and Applications Security and Privacy XXVI - 26th Annual IFIP WG 11.3 Conference, DBSec 2012, Proceedings
Pages193-206
Number of pages14
DOIs
StatePublished - 1 Aug 2012
Event26th Annual WG 11.3 Conference on Data and Applications Security and Privacy, DBSec 2012 - Paris, France
Duration: 11 Jul 201213 Jul 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7371 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th Annual WG 11.3 Conference on Data and Applications Security and Privacy, DBSec 2012
Country/TerritoryFrance
CityParis
Period11/07/1213/07/12

Keywords

  • Byte N-Gram statistical analysis
  • Code type
  • Content type revealing framework
  • File Type

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Code type revealing using experiments framework'. Together they form a unique fingerprint.

Cite this