A comparison of data preparation approaches for e-mail categorisation - Citegraph

Paper Info

Title
A comparison of data preparation approaches for e-mail categorisation

Abstract
This paper reports on experiments in multi-class e-mail categorisation with supervised and unsupervised machine learning techniques. To this end, Support Vector Machines, decision tree learners, instance-based classifiers, Naive Bayes classification approaches and Self-Organising Maps were applied. A word-based and a character n-gram document representation approach were employed in order to assess the categorisation performance of the various learning approaches. The results indicate a substantial increase in classification accuracy when e-mail header information is considered in the document representation. To a much lesser degree, word-based document representations are advantageous over n-gram representations.

Year	DOI	Venue
2007	10.1504/IJIIDS.2007.014946	IJIIDS
Keywords	Field	DocType
character n-gram document representation,data preparation approach,document representation,various learning approach,word-based document representation,e-mail header information,classification accuracy,n-gram representation,multi-class e-mail categorisation,categorisation performance,naive bayes classification approach,feature selection,machine learning,support vector machines	Data mining,Decision tree,Feature selection,Naive Bayes classifier,Computer science,Support vector machine,Unsupervised learning,Artificial intelligence,Header,Linear classifier,Data preparation,Machine learning	Journal
Volume	Issue	Citations
1	2	0
PageRank	References	Authors
0.34	25	3

Authors (3 rows)

Cited by (0 rows)

References (25 rows)

Name	Order	Citations	PageRank
Helmut Berger	1	0	0.34
Dieter Merkl	2	846	115.65
Michael Dittenbach	3	297	26.48

1