Title
Analyzing the Effect of Document Representation on Machine Learning Approaches in Multi-Class e-Mail Filtering
Abstract
This paper reports on experiments in multi-class document categorization with supervised machine learning techniques. The document collection consists of of a set of personal e-mail messages. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Based on these document representations, the categorization performance of five machine learning approaches is assessed and a comparison is given. In principle, both document representation yielded comparable results with the various classifiers. However, the results for the n-gram-based document representation were definitely better in case of an aggressive feature selection strategy.
Year
DOI
Venue
2006
10.1109/WI.2006.41
Web Intelligence
Keywords
Field
DocType
character n-gram document representation,n-gram-based document representation,multi-class document categorization,supervised machine,aggressive feature selection strategy,categorization performance,distinct document representation formalisms,comparable result,multi-class e-mail filtering,document representation,machine learning approaches,document collection,machine learning,text analysis,learning artificial intelligence,feature selection
Categorization,Data mining,Information retrieval,Feature selection,Computer science,Document clustering,Document layout analysis,Filter (signal processing),Document representation,Artificial intelligence,Rotation formalisms in three dimensions,Machine learning
Conference
ISBN
Citations 
PageRank 
0-7695-2747-7
0
0.34
References 
Authors
6
3
Name
Order
Citations
PageRank
Helmut Berger1383.63
Michael Dittenbach229726.48
Dieter Merkl3846115.65