TY - GEN
T1 - Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning
AU - Ashkenazi, Maor
AU - Brenner, Ofir
AU - Shohet, Tal Furman
AU - Treister, Eran
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026/1/1
Y1 - 2026/1/1
N2 - Detecting Large Language Model (LLM)-generated code is a growing challenge with implications for security, intellectual property, and academic integrity. We investigate the role of conditional probability distributions in improving zero-shot LLM-generated code detection, when considering both the code and the corresponding task prompt that generated it. Our key insight is that when evaluating the probability distribution of code tokens using an LLM, there is little difference between LLM-generated and human-written code. However, conditioning on the task reveals notable differences. This contrasts with natural language text, where differences exist even in the unconditional distributions. Leveraging this, we propose a novel zero-shot detection approach that approximates the original task used to generate a given code snippet and then evaluates token-level entropy under the approximated task conditioning (ATC). We further provide a mathematical intuition, contextualizing our method relative to previous approaches. ATC requires neither access to the generator LLM nor the original task prompts, making it practical for real-world applications. To the best of our knowledge, it achieves state-of-the-art results across benchmarks and generalizes across programming languages, including Python, CPP, and Java. Our findings highlight the importance of task-level conditioning for LLM-generated code detection. The supplementary materials and code are available at https://github.com/maorash/ATC, including the dataset gathering implementation, to foster further research in this area.
AB - Detecting Large Language Model (LLM)-generated code is a growing challenge with implications for security, intellectual property, and academic integrity. We investigate the role of conditional probability distributions in improving zero-shot LLM-generated code detection, when considering both the code and the corresponding task prompt that generated it. Our key insight is that when evaluating the probability distribution of code tokens using an LLM, there is little difference between LLM-generated and human-written code. However, conditioning on the task reveals notable differences. This contrasts with natural language text, where differences exist even in the unconditional distributions. Leveraging this, we propose a novel zero-shot detection approach that approximates the original task used to generate a given code snippet and then evaluates token-level entropy under the approximated task conditioning (ATC). We further provide a mathematical intuition, contextualizing our method relative to previous approaches. ATC requires neither access to the generator LLM nor the original task prompts, making it practical for real-world applications. To the best of our knowledge, it achieves state-of-the-art results across benchmarks and generalizes across programming languages, including Python, CPP, and Java. Our findings highlight the importance of task-level conditioning for LLM-generated code detection. The supplementary materials and code are available at https://github.com/maorash/ATC, including the dataset gathering implementation, to foster further research in this area.
KW - LLMs
KW - Synthetic code detection
KW - Synthetic text detection
UR - https://www.scopus.com/pages/publications/105019052532
U2 - 10.1007/978-3-032-06078-5_11
DO - 10.1007/978-3-032-06078-5_11
M3 - Conference contribution
AN - SCOPUS:105019052532
SN - 9783032060778
T3 - Lecture Notes in Computer Science
SP - 187
EP - 204
BT - Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Proceedings
A2 - Ribeiro, Rita P.
A2 - Pfahringer, Bernhard
A2 - Japkowicz, Nathalie
A2 - Larrañaga, Pedro
A2 - Jorge, Alípio M.
A2 - Soares, Carlos
A2 - Abreu, Pedro H.
A2 - Gama, João
PB - Springer Science and Business Media Deutschland GmbH
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Y2 - 15 September 2025 through 19 September 2025
ER -