Abstract
Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to control their content becomes an increasingly important problem. This paper formulates the problem of identifying and erasing a linear subspace that corresponds to a given concept in order to prevent linear predictors from recovering the concept. Our formulation consists of a constrained, linear minimax game. We consider different concept-identification objectives, modeled after several tasks such as classification and regression. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, our method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method-despite being linear-is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
| Original language | English |
|---|---|
| Pages (from-to) | 18400-18421 |
| Number of pages | 22 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 162 |
| State | Published - 1 Jan 2022 |
| Externally published | Yes |
| Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: 17 Jul 2022 → 23 Jul 2022 |
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence