Title
Matching parse thickets for open domain question answering.
Abstract
Traditional parse trees are combined together and enriched with anaphora and rhetoric information to form a unified representation for a paragraph of text. We refer to these representations as parse thickets. They are introduced to support answering complex questions, which include multiple sentences, to tackle as many constraints expressed in this question as possible. The question answering system is designed so that an initial set of answers, which is obtained by a TF*IDF or other keyword search model, is re-ranked. Passage re-ranking is performed using matching of the parse thickets of answers with the parse thicket of the question. To do that, a graph representation and matching technique for parse structures for paragraphs of text have been developed. We define the operation of generalization of two parse thickets as a measure of the distance between paragraphs of text to be the maximal common sub-graph of these parse thickets. A partial case of parse thickets, a rhetoric map of an answer, allows leveraging discourse for relevance in a rule-based manner.Passage re-ranking improvement via parse thickets is evaluated in a variety of search domains with long questions. Using parse thickets improves search accuracy compared with the bag-of words, the pairwise matching of parse trees for sentences, and the tree kernel approaches. As a baseline, we use a web search engine API, which provides much more accurate search results than the majority of search benchmarks, such as TREC. A comparative analysis of the impact of various sources of discourse information on the search accuracy is conducted. An open source plug-in for SOLR is developed so that the proposed technology can be easily integrated with industrial search engines.
Year
DOI
Venue
2017
10.1016/j.datak.2016.11.002
Data Knowl. Eng.
Keywords
Field
DocType
Question-answering systems,Discourse analysis,Rhetoric structure,Parse tree,Parse thicket
Web search engine,Data mining,Parse tree,Computer science,Tree kernel,Paragraph,Natural language processing,Artificial intelligence,Pairwise comparison,Question answering,Information retrieval,Parsing,Database,Graph (abstract data type)
Journal
Volume
Issue
ISSN
107
C
0169-023X
Citations 
PageRank 
References 
2
0.38
35
Authors
1
Name
Order
Citations
PageRank
Boris Galitsky124837.81