Malicious source code detection using a translation model

    Research output: Contribution to journalArticlepeer-review

    5 Scopus citations

    Abstract

    Modern software development often relies on open-source code sharing. Open-source code reuse, however, allows hackers to access wide developer communities, thereby potentially affecting many products. An increasing number of such “supply chain attacks” have occurred in recent years, taking advantage of open-source software development practices. Here, we introduce the Malicious Source code Detection using a Translation model (MSDT) algorithm. MSDT is a novel deep-learning-based analysis method that detects real-world code injections into source code packages. We have tested MSDT by embedding examples from a dataset of over 600,000 different functions and then applying a clustering algorithm to the resulting embedding vectors to identify malicious functions by detecting outliers. We evaluated MSDT's performance with extensive experiments and demonstrated that MSDT could detect malicious code injections with precision@k values of up to 0.909.

    Original languageEnglish
    Article number100773
    JournalPatterns
    Volume4
    Issue number7
    DOIs
    StatePublished - 14 Jul 2023

    Keywords

    • PyPi
    • deep learning
    • malware analysis
    • open source
    • software supply chain attack
    • static analysis

    ASJC Scopus subject areas

    • General Decision Sciences

    Fingerprint

    Dive into the research topics of 'Malicious source code detection using a translation model'. Together they form a unique fingerprint.

    Cite this