Title
Keyword-Based Semi-Supervised Text Classification
Abstract
Industrial organizations generate massive volumes of data during their routine business and production activities. Such data may be structured (numerical or categorical), or it may be unstructured and textual. Both structured and unstructured data contain a wealth of knowledge that can help organizations improve their operations. Organizations find it easy to automatically extract knowledge from structured data. Unstructured data, however, must be mined and interpreted manually which is cumbersome, error-prone and time consuming. This paper focuses on how to automatically analyze unstructured text data to extract important business value. It proposes a semi-supervised natural language (NL) approach to analyze a corpus of documents associated with accounts receivable disputes at a large corporation. The name semi-supervised derives from the philosophy underlying the methodology, where a set of categories and the keywords associated with these categories are defined in consultation with the domain experts. Subsequently, these categories and their associated keywords are supplied as input to the algorithm, which classifies the disputes automatically into these pre-defined categories. The performance of the semi-supervised methodology is very comparable to that of the random forest, which is a supervised learning approach. The paper discusses the benefits of the semi-supervised approach over supervised learning; namely, a considerable reduction in the manual effort to analyze, understand and label training data set, without any noticeable degradation in performance.
Year
DOI
Venue
2019
10.1109/COMPSAC.2019.00067
2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1
DocType
ISSN
Citations 
Conference
0730-3157
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Karl Severin131.08
Swapna S. Gokhale286077.93
Aldo Dagnino319421.07