Title
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering
Abstract
Hand-crafted textual patterns have been the mainstay device of practical relation extraction for decades. However, there has been little work on reducing the manual effort involved in the discovery of effective textual patterns for relation extraction. In this paper, we propose a clustering-based approach to facilitate the pattern discovery for relation extraction. Specifically, we define the notion of semantic signature to represent the most salient features of a textual fragment. We then propose a novel clustering algorithm based on semantic signature, S2C, and its enhancement S2C+. Experiments on two real-world data sets show that, when compared with k-means clustering, S2C and S2C+ are at least an order of magnitude faster, while generating high quality clusters that are at least comparable to the best clusters generated by k-means without requiring any manual tuning. Finally, a user study confirms that our clustering-based approach can indeed help users discover effective textual patterns for relation extraction with only a fraction of the manual effort required by the conventional approach.
Year
DOI
Venue
2011
10.1145/2063576.2063781
CIKM
Keywords
Field
DocType
semantic signature,manual effort,facilitating pattern discovery,hand-crafted textual pattern,conventional approach,enhancement s2c,semantic-signature-based clustering,clustering-based approach,effective textual pattern,practical relation extraction,textual fragment,relation extraction,clustering,k means,information extraction,k means clustering
Fuzzy clustering,Data mining,CURE data clustering algorithm,Computer science,Document clustering,Artificial intelligence,Cluster analysis,Relationship extraction,Canopy clustering algorithm,Data stream clustering,Pattern recognition,Information retrieval,Correlation clustering
Conference
Citations 
PageRank 
References 
6
0.62
31
Authors
5
Name
Order
Citations
PageRank
Yunyao Li153037.81
Vivian Chu2744.67
Sebastian Blohm3494.02
Huaiyu Zhu4623.08
Howard Ho533719.47