TY - GEN
T1 - DoWe Know WhatWe Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0
AU - Sulem, Elior
AU - Hay, Jamaal
AU - Roth, Dan
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Understanding when a text snippet does not provide a sought after information is an essential part of natural language understanding. Recent work (SQuAD 2.0, Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, noncompetitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE, Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.
AB - Understanding when a text snippet does not provide a sought after information is an essential part of natural language understanding. Recent work (SQuAD 2.0, Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, noncompetitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE, Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.
UR - http://www.scopus.com/inward/record.url?scp=85128624349&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85128624349
T3 - Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
SP - 4543
EP - 4548
BT - Findings of the Association for Computational Linguistics, Findings of ACL
A2 - Moens, Marie-Francine
A2 - Huang, Xuanjing
A2 - Specia, Lucia
A2 - Yih, Scott Wen-Tau
PB - Association for Computational Linguistics (ACL)
T2 - 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
Y2 - 7 November 2021 through 11 November 2021
ER -