Keyword-Based Semi-Supervised Text Classification - Citegraph

Paper Info

Title
Keyword-Based Semi-Supervised Text Classification

Abstract
Industrial organizations generate massive volumes of data during their routine business and production activities. Such data may be structured (numerical or categorical), or it may be unstructured and textual. Both structured and unstructured data contain a wealth of knowledge that can help organizations improve their operations. Organizations find it easy to automatically extract knowledge from structured data. Unstructured data, however, must be mined and interpreted manually which is cumbersome, error-prone and time consuming. This paper focuses on how to automatically analyze unstructured text data to extract important business value. It proposes a semi-supervised natural language (NL) approach to analyze a corpus of documents associated with accounts receivable disputes at a large corporation. The name semi-supervised derives from the philosophy underlying the methodology, where a set of categories and the keywords associated with these categories are defined in consultation with the domain experts. Subsequently, these categories and their associated keywords are supplied as input to the algorithm, which classifies the disputes automatically into these pre-defined categories. The performance of the semi-supervised methodology is very comparable to that of the random forest, which is a supervised learning approach. The paper discusses the benefits of the semi-supervised approach over supervised learning; namely, a considerable reduction in the manual effort to analyze, understand and label training data set, without any noticeable degradation in performance.

Year	DOI	Venue
2019	10.1109/COMPSAC.2019.00067	2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1
DocType	ISSN	Citations
Conference	0730-3157	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Karl Severin	1	3	1.08
Swapna S. Gokhale	2	860	77.93
Aldo Dagnino	3	194	21.07

1