TY - GEN
T1 - Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation
AU - Ish-Shalom, Oren
AU - Itzhaky, Shachar
AU - Manevich, Roman
AU - Rinetzky, Noam
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP, we compute a partial pseudo-inverse function such that given a target string, if then. The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis. Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet. We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.
AB - We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP, we compute a partial pseudo-inverse function such that given a target string, if then. The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis. Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet. We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.
UR - http://www.scopus.com/inward/record.url?scp=85079105828&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-39322-9_9
DO - 10.1007/978-3-030-39322-9_9
M3 - Conference contribution
AN - SCOPUS:85079105828
SN - 9783030393212
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 180
EP - 201
BT - Verification, Model Checking, and Abstract Interpretation - 21st International Conference, VMCAI 2020, Proceedings
A2 - Beyer, Dirk
A2 - Zufferey, Damien
PB - Springer
T2 - 21st International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI 2020
Y2 - 16 January 2020 through 21 January 2020
ER -