Title
A two-stage discretization algorithm based on information entropy.
Abstract
Discretization is an important and difficult preprocessing task for data mining and knowledge discovery. Although there are numerous discretization approaches, many suffer from certain drawbacks. Local approaches are efficient, but their generalization ability is weak. Global approaches consider all attributes simultaneously, but they have high time and space complexities. In this paper, we propose a two-stage discretization (TSD) algorithm based on information entropy. In the local discretization stage, we independently select strong cuts for each attribute to minimize conditional entropy. The goal is to rapidly reduce the cardinality of the attributes, with minor information loss. In the global discretization stage, cuts for all attributes are considered simultaneously to form a scaled decision system. The minimal cut set that preserves the positive region is finally selected. We tested the new algorithm and seven popular algorithms on 28 datasets. Compared with other approaches, our algorithm has the best generalization ability, with a good information preserving ability, the highest classification accuracy, and reasonable time consumption.
Year
DOI
Venue
2017
https://doi.org/10.1007/s10489-017-0941-0
Appl. Intell.
Keywords
Field
DocType
Classification,Discretization,Information entropy,Real-value attribute,Scaling
Cut,Discretization,Computer science,Cardinality,Artificial intelligence,Entropy (information theory),Mathematical optimization,Algorithm,Preprocessor,Knowledge extraction,Conditional entropy,Machine learning,Discretization of continuous features
Journal
Volume
Issue
ISSN
47
4
0924-669X
Citations 
PageRank 
References 
1
0.35
35
Authors
3
Name
Order
Citations
PageRank
Liu-Ying Wen160.72
Fan Min2605.78
Shiyuan Wang321.46