Title | ||
---|---|---|
Analyzing the Effect of Document Representation on Machine Learning Approaches in Multi-Class e-Mail Filtering |
Abstract | ||
---|---|---|
This paper reports on experiments in multi-class document categorization with supervised machine learning techniques. The document collection consists of of a set of personal e-mail messages. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Based on these document representations, the categorization performance of five machine learning approaches is assessed and a comparison is given. In principle, both document representation yielded comparable results with the various classifiers. However, the results for the n-gram-based document representation were definitely better in case of an aggressive feature selection strategy. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1109/WI.2006.41 | Web Intelligence |
Keywords | Field | DocType |
character n-gram document representation,n-gram-based document representation,multi-class document categorization,supervised machine,aggressive feature selection strategy,categorization performance,distinct document representation formalisms,comparable result,multi-class e-mail filtering,document representation,machine learning approaches,document collection,machine learning,text analysis,learning artificial intelligence,feature selection | Categorization,Data mining,Information retrieval,Feature selection,Computer science,Document clustering,Document layout analysis,Filter (signal processing),Document representation,Artificial intelligence,Rotation formalisms in three dimensions,Machine learning | Conference |
ISBN | Citations | PageRank |
0-7695-2747-7 | 0 | 0.34 |
References | Authors | |
6 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Helmut Berger | 1 | 38 | 3.63 |
Michael Dittenbach | 2 | 297 | 26.48 |
Dieter Merkl | 3 | 846 | 115.65 |