Abstract
Cross-lingual Extractive Question Answering (EQA) extends standard EQA by requiring models to find answers in passages written in languages different from the questions. The Generalized Cross-Lingual Transfer (G-XLT) task evaluates models' zero-shot ability to transfer question answering capabilities across languages using only English training data. While previous research has primarily focused on scenarios where answers are always present, real-world applications often encounter situations where no answer exists within the given context. This paper introduces an enhanced G-XLT task definition that explicitly handles unanswerable questions, bridging a critical gap in current research. To address this challenge, we present two new datasets: miXQuAD and MLQA-IDK, which address both answerable and unanswerable questions and respectively cover 12 and 7 language pairs. Our study evaluates state-of-the-art large language models using fine-tuning, parameter-efficient techniques, and in-context learning approaches, revealing interesting trade-offs between a smaller fine-tuned model's performance on answerable questions versus a larger in-context learning model's capability on unanswerable questions. We also examine language similarity patterns based on model performance, finding alignments with known language families.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025) |
| Editors | Lea Frermann, Mark Stevenson |
| Place of Publication | Suzhou, China |
| Publisher | Association for Computational Linguistics |
| Pages | 100-121 |
| Number of pages | 22 |
| ISBN (Print) | 9798891763401 |
| DOIs | |
| State | Published - 1 Nov 2025 |
Fingerprint
Dive into the research topics of 'Cross-Lingual Extractive Question Answering with Unanswerable Questions'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver