DoWe Know WhatWe Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0

Elior Sulem, Jamaal Hay, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Understanding when a text snippet does not provide a sought after information is an essential part of natural language understanding. Recent work (SQuAD 2.0, Rajpurkar et al., 2018) has attempted to make some progress in this direction by enriching the SQuAD dataset for the Extractive QA task with unanswerable questions. However, as we show, the performance of a top system trained on SQuAD 2.0 drops considerably in out-of-domain scenarios, limiting its use in practical situations. In order to study this we build an out-of-domain corpus, focusing on simple event-based questions and distinguish between two types of IDK questions: competitive questions, where the context includes an entity of the same type as the expected answer, and simpler, noncompetitive questions where there is no entity of the same type in the context. We find that SQuAD 2.0-based models fail even in the case of the simpler questions. We then analyze the similarities and differences between the IDK phenomenon in Extractive QA and the Recognizing Textual Entailments task (RTE, Dagan et al., 2013) and investigate the extent to which the latter can be used to improve the performance.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publicationEMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
PublisherAssociation for Computational Linguistics (ACL)
Pages4543-4548
Number of pages6
ISBN (Electronic)9781955917100
StatePublished - 1 Jan 2021
Externally publishedYes
Event2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameFindings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Conference

Conference2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period7/11/2111/11/21

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'DoWe Know WhatWe Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0'. Together they form a unique fingerprint.

Cite this