A two-stage discretization algorithm based on information entropy. - Citegraph

Paper Info

Title
A two-stage discretization algorithm based on information entropy.

Abstract
Discretization is an important and difficult preprocessing task for data mining and knowledge discovery. Although there are numerous discretization approaches, many suffer from certain drawbacks. Local approaches are efficient, but their generalization ability is weak. Global approaches consider all attributes simultaneously, but they have high time and space complexities. In this paper, we propose a two-stage discretization (TSD) algorithm based on information entropy. In the local discretization stage, we independently select strong cuts for each attribute to minimize conditional entropy. The goal is to rapidly reduce the cardinality of the attributes, with minor information loss. In the global discretization stage, cuts for all attributes are considered simultaneously to form a scaled decision system. The minimal cut set that preserves the positive region is finally selected. We tested the new algorithm and seven popular algorithms on 28 datasets. Compared with other approaches, our algorithm has the best generalization ability, with a good information preserving ability, the highest classification accuracy, and reasonable time consumption.

Year	DOI	Venue
2017	https://doi.org/10.1007/s10489-017-0941-0	Appl. Intell.
Keywords	Field	DocType
Classification,Discretization,Information entropy,Real-value attribute,Scaling	Cut,Discretization,Computer science,Cardinality,Artificial intelligence,Entropy (information theory),Mathematical optimization,Algorithm,Preprocessor,Knowledge extraction,Conditional entropy,Machine learning,Discretization of continuous features	Journal
Volume	Issue	ISSN
47	4	0924-669X
Citations	PageRank	References
1	0.35	35
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (35 rows)

Name	Order	Citations	PageRank
Liu-Ying Wen	1	6	0.72
Fan Min	2	60	5.78
Shiyuan Wang	3	2	1.46

1