Title
An Interactive Visual Analytics System for Incremental Classification Based on Semi-supervised Topic Modeling
Abstract
Text labeling for classification is a time-consuming and unintuitive process. Given an unannotated text collection, it is difficult for users to determine what label to create and how to label the initial training set for classification. Thus, we present an interactive visual analytics system for incremental text classification based on a semi-supervised topic modeling method, modified Gibbs sampling maximum entropy discrimination latent Dirichlet allocation (Gibbs MedLDA). Given a text collection, Gibbs MedLDA generates topics as a summary of the text collection. We design a scatter plot to display documents and topics simultaneously to show the topic information, and this helps users explore the text collection structurally and find labels for creating. After labeling documents, Gibbs MedLDA is applied to the text collection with labels again, and it generates both the topic and classification information. We also provide a scatter plot with the classifier boundary and a matrix view to present weights of classifiers. Users can iteratively label documents to refine each classifier. We evaluate our system via a user study with a benchmark corpus for text classification and case studies with two unannotated text collections.
Year
DOI
Venue
2019
10.1109/PacificVis.2019.00025
2019 IEEE Pacific Visualization Symposium (PacificVis)
Keywords
Field
DocType
topic-modeling,visual-analytics,text-classification
Latent Dirichlet allocation,Task analysis,Computer science,Visual analytics,Natural language processing,Artificial intelligence,Topic model,Principle of maximum entropy,Classifier (linguistics),Scatter plot,Gibbs sampling
Conference
ISSN
ISBN
Citations 
2165-8765
978-1-5386-9227-1
0
PageRank 
References 
Authors
0.34
16
5
Name
Order
Citations
PageRank
Yuyu Yan1162.90
Yubo Tao210922.51
Sichen Jin300.68
Jin Xu461.75
Hai Lin514229.61