TY - GEN
T1 - Optimizing Counterfactual-based Analysis of Machine Learning Models Through Databases
AU - Arie, Aviv Ben
AU - Deutch, Daniel
AU - Frost, Nave
AU - Horesh, Yair
AU - Meyuhas, Idan
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/3/18
Y1 - 2024/3/18
N2 - In the context of Machine Learning models, counterfactuals (CFs) are hypothetical perturbations to a given input of the model that would result in a different classification outcome. Multiple lines of recent work have proposed algorithms for finding CFs (hereby referred to as CF Generators) and demonstrated their value in providing insights for model owners. However, obtaining these insights may be computationally expensive, often requiring many invocations of these algorithms with complex constraints. In this work, we complement these efforts by presenting CFDB: a relational, declarative framework for CF-based analysis. Users of CFDB specify analysis tasks as declarative queries over a relational schema tailored for CFs. CFDB then compiles the specification into a series of CF requests, to be fed as input to CF Generators. The main advantage of this approach is that it allows to optimize the tradeoff between CF generation time and quality. Specifically, our optimizations are based on the observation that often, one may satisfy multiple CF requests using the same CFs, thereby reducing the total number of costly CF Generator invocations. We design algorithms that identify when such reuse is possible and optimize the computation accordingly. We experimentally demonstrate the usefulness of our approach and our optimizations, in the context of multiple datasets, multiple previously proposed Counterfactual Generators, and use cases such as assessing model fairness.
AB - In the context of Machine Learning models, counterfactuals (CFs) are hypothetical perturbations to a given input of the model that would result in a different classification outcome. Multiple lines of recent work have proposed algorithms for finding CFs (hereby referred to as CF Generators) and demonstrated their value in providing insights for model owners. However, obtaining these insights may be computationally expensive, often requiring many invocations of these algorithms with complex constraints. In this work, we complement these efforts by presenting CFDB: a relational, declarative framework for CF-based analysis. Users of CFDB specify analysis tasks as declarative queries over a relational schema tailored for CFs. CFDB then compiles the specification into a series of CF requests, to be fed as input to CF Generators. The main advantage of this approach is that it allows to optimize the tradeoff between CF generation time and quality. Specifically, our optimizations are based on the observation that often, one may satisfy multiple CF requests using the same CFs, thereby reducing the total number of costly CF Generator invocations. We design algorithms that identify when such reuse is possible and optimize the computation accordingly. We experimentally demonstrate the usefulness of our approach and our optimizations, in the context of multiple datasets, multiple previously proposed Counterfactual Generators, and use cases such as assessing model fairness.
UR - http://www.scopus.com/inward/record.url?scp=85190948030&partnerID=8YFLogxK
U2 - 10.48786/edbt.2024.51
DO - 10.48786/edbt.2024.51
M3 - Conference contribution
AN - SCOPUS:85190948030
T3 - Advances in Database Technology - EDBT
SP - 597
EP - 609
BT - Advances in Database Technology - EDBT
PB - OpenProceedings.org
T2 - 27th International Conference on Extending Database Technology, EDBT 2024
Y2 - 25 March 2024 through 28 March 2024
ER -