Abstract | ||
---|---|---|
Current Wikipedia editing approaches typically summarize a named entity by one main-article supplemented by multiple sub-articles describing various aspects and subtopics of the entity. Such separation of articles aims at improving the curation of content-rich Wikipedia entities. However, a wide range of Wikipedia-based technologies critically rely on the articleas-concept assumption, which requires a one-to-one mapping between entities (or concepts) and the articles that describe these entities. Thus, the current editing approaches sow confusion and ambiguity to knowledge representation, and cause problems to a wide-range of downstream technologies. In this paper, we present an approach that resolves these problems by differentiating the main-article from the sub-articles that are not at the core of entity representations. We propose a hybrid neural article model that learns on two facets of a Wikipedia article: (i) Two neural document encoders capture the latent semantic features from the article title and text contents. (ii) A set of explicit features measure and characterize the symbolic and structural aspects of each article. In this study, we use crowdsourcing to create a large annotated dataset for feature extraction, and for evaluating a variety of encoding techniques and learning structures. The optimized model so derived identifies main articles with near-perfect precision and recall, and outperforms various baselines on the contributed dataset. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/BigData47090.2019.9005578 | 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) |
Field | DocType | ISSN |
Knowledge representation and reasoning,Computer science,Crowdsourcing,Precision and recall,Named entity,Feature extraction,Artificial intelligence,Natural language processing,Encoder,Ambiguity,Machine learning,Encoding (memory) | Conference | 2639-1589 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Muhao Chen | 1 | 83 | 20.01 |
Changping Meng | 2 | 33 | 1.89 |
Gang Huang | 3 | 4 | 3.78 |
Carlo Zaniolo | 4 | 4305 | 1447.58 |