TY - GEN
T1 - Syntactic parsing of web queries with question intent
AU - Pinter, Yuval
AU - Reichart, Roi
AU - Szpektor, Idan
N1 - Publisher Copyright:
©2016 Association for Computational Linguistics.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Accurate automatic processing of Web queries is important for high-quality information retrieval from the Web. While the syntactic structure of a large portion of these queries is trivial, the structure of queries with question intent is much richer. In this paper we therefore address the task of statistical syntactic parsing of such queries. We first show that the standard dependency grammar does not account for the full range of syntactic structures manifested by queries with question intent. To alleviate this issue we extend the dependency grammar to account for segments - independent syntactic units within a potentially larger syntactic structure. We then propose two distant supervision approaches for the task. Both algorithms do not require manually parsed queries for training. Instead, they are trained on millions of (query, page title) pairs from the Community Question Answering (CQA) domain, where the CQA page was clicked by the user who initiated the query in a search engine. Experiments on a new treebank1 consisting of 5,000 Web queries from the CQA domain, manually parsed using the proposed grammar, show that our algorithms outperform alternative approaches trained on various sources: tens of thousands of manually parsed OntoNotes sentences, millions of unlabeled CQA queries and thousands of manually segmented CQA queries.
AB - Accurate automatic processing of Web queries is important for high-quality information retrieval from the Web. While the syntactic structure of a large portion of these queries is trivial, the structure of queries with question intent is much richer. In this paper we therefore address the task of statistical syntactic parsing of such queries. We first show that the standard dependency grammar does not account for the full range of syntactic structures manifested by queries with question intent. To alleviate this issue we extend the dependency grammar to account for segments - independent syntactic units within a potentially larger syntactic structure. We then propose two distant supervision approaches for the task. Both algorithms do not require manually parsed queries for training. Instead, they are trained on millions of (query, page title) pairs from the Community Question Answering (CQA) domain, where the CQA page was clicked by the user who initiated the query in a search engine. Experiments on a new treebank1 consisting of 5,000 Web queries from the CQA domain, manually parsed using the proposed grammar, show that our algorithms outperform alternative approaches trained on various sources: tens of thousands of manually parsed OntoNotes sentences, millions of unlabeled CQA queries and thousands of manually segmented CQA queries.
UR - http://www.scopus.com/inward/record.url?scp=84994184183&partnerID=8YFLogxK
U2 - 10.18653/v1/n16-1081
DO - 10.18653/v1/n16-1081
M3 - Conference contribution
AN - SCOPUS:84994184183
T3 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference
SP - 670
EP - 680
BT - 2016 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016
Y2 - 12 June 2016 through 17 June 2016
ER -