TY - GEN
T1 - DeepReflect
T2 - 30th USENIX Security Symposium, USENIX Security 2021
AU - Downing, Evan
AU - Mirsky, Yisroel
AU - Park, Kyuhong
AU - Lee, Wenke
N1 - Publisher Copyright:
© 2021 by The USENIX Association. All rights reserved.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Deep learning has continued to show promising results for malware classification. However, to identify key malicious behaviors, malware analysts are still tasked with reverse engineering unknown malware binaries using static analysis tools, which can take hours. Although machine learning can be used to help identify important parts of a binary, supervised approaches are impractical due to the expense of acquiring a sufficiently large labeled dataset. To increase the productivity of static (or manual) reverse engineering, we propose DEEPREFLECT: a tool for localizing and identifying malware components within a malicious binary. To localize malware components, we use an unsupervised deep neural network in a novel way, and classify the components through a semi-supervised cluster analysis, where analysts incrementally provide labels during their daily work flow. The tool is practical since it requires no data labeling to train the localization model, and minimal/noninvasive labeling to train the classifier incrementally. In our evaluation with five malware analysts on over 26k malware samples, we found that DEEPREFLECT reduces the number of functions that an analyst needs to reverse engineer by 85% on average. Our approach also detects 80% of the malware components compared to 43% when using a signature-based tool (CAPA). Furthermore, DEEPREFLECT performs better with our proposed autoencoder than SHAP (an AI explanation tool). This is significant because SHAP, a state-of-the-art method, requires a labeled dataset and autoencoders do not.
AB - Deep learning has continued to show promising results for malware classification. However, to identify key malicious behaviors, malware analysts are still tasked with reverse engineering unknown malware binaries using static analysis tools, which can take hours. Although machine learning can be used to help identify important parts of a binary, supervised approaches are impractical due to the expense of acquiring a sufficiently large labeled dataset. To increase the productivity of static (or manual) reverse engineering, we propose DEEPREFLECT: a tool for localizing and identifying malware components within a malicious binary. To localize malware components, we use an unsupervised deep neural network in a novel way, and classify the components through a semi-supervised cluster analysis, where analysts incrementally provide labels during their daily work flow. The tool is practical since it requires no data labeling to train the localization model, and minimal/noninvasive labeling to train the classifier incrementally. In our evaluation with five malware analysts on over 26k malware samples, we found that DEEPREFLECT reduces the number of functions that an analyst needs to reverse engineer by 85% on average. Our approach also detects 80% of the malware components compared to 43% when using a signature-based tool (CAPA). Furthermore, DEEPREFLECT performs better with our proposed autoencoder than SHAP (an AI explanation tool). This is significant because SHAP, a state-of-the-art method, requires a labeled dataset and autoencoders do not.
UR - http://www.scopus.com/inward/record.url?scp=85114507671&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85114507671
T3 - Proceedings of the 30th USENIX Security Symposium
SP - 3469
EP - 3486
BT - Proceedings of the 30th USENIX Security Symposium
PB - USENIX Association
Y2 - 11 August 2021 through 13 August 2021
ER -