Syntactic parsing of web queries with question intent

Yuval Pinter, Roi Reichart, Idan Szpektor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Accurate automatic processing of Web queries is important for high-quality information retrieval from the Web. While the syntactic structure of a large portion of these queries is trivial, the structure of queries with question intent is much richer. In this paper we therefore address the task of statistical syntactic parsing of such queries. We first show that the standard dependency grammar does not account for the full range of syntactic structures manifested by queries with question intent. To alleviate this issue we extend the dependency grammar to account for segments - independent syntactic units within a potentially larger syntactic structure. We then propose two distant supervision approaches for the task. Both algorithms do not require manually parsed queries for training. Instead, they are trained on millions of (query, page title) pairs from the Community Question Answering (CQA) domain, where the CQA page was clicked by the user who initiated the query in a search engine. Experiments on a new treebank1 consisting of 5,000 Web queries from the CQA domain, manually parsed using the proposed grammar, show that our algorithms outperform alternative approaches trained on various sources: tens of thousands of manually parsed OntoNotes sentences, millions of unlabeled CQA queries and thousands of manually segmented CQA queries.

Original languageEnglish
Title of host publication2016 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL HLT 2016 - Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages670-680
Number of pages11
ISBN (Electronic)9781941643914
DOIs
StatePublished - 1 Jan 2016
Externally publishedYes
Event15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - San Diego, United States
Duration: 12 Jun 201617 Jun 2016

Publication series

Name2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference

Conference

Conference15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016
Country/TerritoryUnited States
CitySan Diego
Period12/06/1617/06/16

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Syntactic parsing of web queries with question intent'. Together they form a unique fingerprint.

Cite this