Title
Enriching Documents by Linking Salient Entities and Lexical-Semantic Expansion
Abstract
This paper explores a multi-strategy technique that aims at enriching text documents for improving clustering quality. We use a combination of entity linking and document summarization in order to determine the identity of the most salient entities mentioned in texts. To effectively enrich documents without introducing noise, we limit ourselves to the text fragments mentioning the salient entities, in turn, belonging to a knowledge base like Wikipedia, while the actual enrichment of text fragments is carried out using WordNet. To feed clustering algorithms, we investigate different document representations obtained using several combinations of document enrichment and feature extraction. This allows us to exploit ensemble clustering, by combining multiple clustering results obtained using different document representations. Our experiments indicate that our novel enriching strategies, combined with ensemble clustering, can improve the quality of classical text clustering when applied to text corpora like The British Broadcasting Corporation (BBC) NEWS.
Year
DOI
Venue
2020
10.1515/jisys-2018-0098
JOURNAL OF INTELLIGENT SYSTEMS
Keywords
Field
DocType
Document clustering,document enriching
Computer science,Natural language processing,Artificial intelligence,Salient
Journal
Volume
Issue
ISSN
29
1
0334-1860
Citations 
PageRank 
References 
0
0.34
6
Authors
2
Name
Order
Citations
PageRank
Mohsen Pourvali1101.89
Salvatore Orlando21595202.29