Title
Deep truth discovery for pattern-based fact extraction
Abstract
Fact extraction, which aims to extract (entity, attribute, value)-tuples from massive text corpora, is crucial in the area of text data mining. Recent approaches have focused on extracting facts by mining textual patterns with semantic types, where the quality of a pattern is evaluated based on content-based criteria, such as frequency. However, these approaches overlook the dimension of pattern reliability, which reflects how likely the extracted facts are correct. As a result, a pattern of good content-quality (e.g., high frequency) may still extract incorrect facts. In this study, we consider both pattern reliability and fact trustworthiness in addressing the pattern-based fact extraction problem. To learn the complex relationship between pattern reliability and fact trustworthiness, we propose a novel deep learning model using a hybrid of the CNN and LSTM architecture. For fact embedding, we adopt CNN to extract a fix-sized representation of each component, i.e., entity, attribute, and value, of the fact. For pattern embedding, we represent the pattern as a semantic composition of its extracted fact representations. To de-emphasis the noisy facts, we consider the fact trustworthiness and frequency during the process of pattern embedding, where the features of the tuple trustworthiness information are extracted by a long short-term memory (LSTM) model. To learn the pattern-fact relational dependency, we train the model with both pattern and tuple labels. Extensive experiments involving three real-world datasets demonstrated that the proposed model significantly improves the quality of the patterns and the extracted facts in the pattern-based information extraction.
Year
DOI
Venue
2021
10.1016/j.ins.2021.08.084
Information Sciences
Keywords
DocType
Volume
Fact extraction,Textual patterns,Deep learning,Neural network
Journal
580
ISSN
Citations 
PageRank 
0020-0255
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Chen Ye184.16
Hongzhi Wang242173.72
Wenbo Lu300.68
Jing Gao400.34
Guojun Dai547241.96