Title
Hierarchical Classification of Documents with Error Control
Abstract
Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method -- instead of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically. In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique. Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.
Year
DOI
Venue
2001
10.1007/3-540-45357-1_46
PAKDD
Keywords
Field
DocType
traditional method,classification work,single classifier,class taxonomy,high level class,error control,efficient method,predefined class,hierarchical classification,document classification,high overhead,parallel algorithm,software fault tolerance
Data mining,One-class classification,Computer science,Class (biology),Artificial intelligence,Classifier (linguistics),Document classification,Pattern recognition,Parallel algorithm,Software fault tolerance,Fault tolerance,Linear classifier,Machine learning
Conference
ISBN
Citations 
PageRank 
3-540-41910-1
10
0.63
References 
Authors
17
4
Name
Order
Citations
PageRank
Chun Hung Cheng150839.17
Jian Tang2526148.30
Ada Wai-Chee Fu34646417.59
Irwin King46751325.94