Title
Learning To Differentiate Between Main-Articles And Sub-Articles In Wikipedia
Abstract
Current Wikipedia editing approaches typically summarize a named entity by one main-article supplemented by multiple sub-articles describing various aspects and subtopics of the entity. Such separation of articles aims at improving the curation of content-rich Wikipedia entities. However, a wide range of Wikipedia-based technologies critically rely on the articleas-concept assumption, which requires a one-to-one mapping between entities (or concepts) and the articles that describe these entities. Thus, the current editing approaches sow confusion and ambiguity to knowledge representation, and cause problems to a wide-range of downstream technologies. In this paper, we present an approach that resolves these problems by differentiating the main-article from the sub-articles that are not at the core of entity representations. We propose a hybrid neural article model that learns on two facets of a Wikipedia article: (i) Two neural document encoders capture the latent semantic features from the article title and text contents. (ii) A set of explicit features measure and characterize the symbolic and structural aspects of each article. In this study, we use crowdsourcing to create a large annotated dataset for feature extraction, and for evaluating a variety of encoding techniques and learning structures. The optimized model so derived identifies main articles with near-perfect precision and recall, and outperforms various baselines on the contributed dataset.
Year
DOI
Venue
2019
10.1109/BigData47090.2019.9005578
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Field
DocType
ISSN
Knowledge representation and reasoning,Computer science,Crowdsourcing,Precision and recall,Named entity,Feature extraction,Artificial intelligence,Natural language processing,Encoder,Ambiguity,Machine learning,Encoding (memory)
Conference
2639-1589
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Muhao Chen18320.01
Changping Meng2331.89
Gang Huang343.78
Carlo Zaniolo443051447.58