Malicious source code detection using a translation model

Chen Tsfaty, Michael Fire

Research output: Contribution to journalArticlepeer-review


Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT's performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909.

Original languageEnglish
Article number100773
Issue number7
StatePublished - 14 Jul 2023


  • PyPi
  • deep learning
  • malware analysis
  • open source
  • software supply chain attack
  • static analysis

ASJC Scopus subject areas

  • General Decision Sciences


Dive into the research topics of 'Malicious source code detection using a translation model'. Together they form a unique fingerprint.

Cite this