Abstract | ||
---|---|---|
Trying to comprehend the structure and content of large text corpora can be a daunting and often time consuming task. In this paper, we introduce a novel tool that exploits the structural properties for extracting and visualizing the underlying topics in a given dataset. To this end, we make use of a combination of latent topic analysis, discriminative feature selection applied on top of the category structure of corpora, and various ranking methods in order to extract the most representative topics for a given corpus. The visual moniker to depict the outcome of these methods can be chosen based on the context. Such visual representations can be useful for depicting trends, identifying ``hot'' topics, and discovering interesting patterns in the underlying data. As applications, we create example representations for a variety of corpora obtained from conference proceedings, movie summaries, and newsgroup postings. Our user experiments demonstrate the viability of our approach, with a flower-like visualization inspired by the ``wheel of emotion'', for generating high quality representative topics and for unearthing hidden structures and connections in large document corpora. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3020165.3020182 | CHIIR |
Field | DocType | Citations |
Feature selection,Information retrieval,Ranking,Visualization,Computer science,Text corpus,Exploit,Mutual information,Artificial intelligence,Natural language processing,Topic analysis,Discriminative model | Conference | 0 |
PageRank | References | Authors |
0.34 | 30 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jaspreet Singh Suri | 1 | 337 | 29.90 |
Sergej Zerr | 2 | 158 | 15.85 |
Stefan Siersdorfer | 3 | 643 | 34.70 |