Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation

Oren Ish-Shalom, Shachar Itzhaky, Roman Manevich, Noam Rinetzky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP, we compute a partial pseudo-inverse function such that given a target string, if then. The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis. Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet. We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.

Original languageEnglish
Title of host publicationVerification, Model Checking, and Abstract Interpretation - 21st International Conference, VMCAI 2020, Proceedings
EditorsDirk Beyer, Damien Zufferey
PublisherSpringer
Pages180-201
Number of pages22
ISBN (Print)9783030393212
DOIs
StatePublished - 1 Jan 2020
Event21st International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI 2020 - New Orleans, United States
Duration: 16 Jan 202021 Jan 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11990 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI 2020
Country/TerritoryUnited States
CityNew Orleans
Period16/01/2021/01/20

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation'. Together they form a unique fingerprint.

Cite this