Title
Harvesting More Answer Spans from Paragraph beyond Annotation
Abstract
ABSTRACTAutomaticA nswer spanE xtraction (AE) focuses on identifying key information from paragraphs that can be asked. It has been used to facilitate downstream question generation tasks or data augmentation for question answering. Current work of AE heavily relies on the annotated answer spans fromM achineR eadingC omprehension (MRC) datasets. However, these methods suffer from the partial annotation problem due to the annotation protocols of MRC tasks. To tackle this problem, we propose \mymethod, a S tructured Co ntext graph network with P ositive -unlabeled learning. \mymethod first represents the paragraph by constructing a graph with both syntactic and semantic edges, then adopts a unified pointer network for answer span identification. \mymethod narrows the discrenpency between AE and MRC by formulating AE as aP ositive-\textitu nlabeled (PU) learning problem, thus recovering more answer spans from paragraphs. To evaluate newly extracted spans without annotation, we also present an automatic metric from the perspective of question answering and text summarization, which correlates well with human judgments. Comprehensive experiments on both AE and downstream tasks demonstrate the effectiveness of our proposed framework. Our code is available at \urlhttps://github.com/iambabao/SCOPE.
Year
DOI
Venue
2022
10.1145/3488560.3498399
WSDM
Keywords
DocType
Citations 
Information extraction, Positive-unlabeled learning
Conference
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Qiaoben Bao100.68
Jiangjie Chen201.35
Linfang Liu300.34
Jingping Liu433.43
Jiaqing Liang5379.59
Yanghua Xiao648254.90