Title
On effective e-mail classification via neural networks
Abstract
For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying and cleansing method in this paper. Incidentally, E-mail messages can be modelled as semi-structured documents consisting of a set of fields with pre-defined semantics and a number of variable length free-text fields. Our proposed method deals with both fields having pre-defined semantics as well as variable length free-text fields for obtaining higher accuracy. The main contributions of this work are two-fold. First, we present a new model based on the Neural Network (NN) for classifying personal E-mails. In particular, we treat E-mail files as a particular kind of plain text files, the implication being that our feature set is relatively large (since there are thousands of different terms in different E-mail files). Second, we propose the use of Principal Component Analysis (PCA) as a preprocessor of NN to reduce the data in terms of both size as well as dimensionality so that the input data become more classifiable and faster for the convergence of the training process used in the NN model. The results of our performance evaluation demonstrate that the proposed algorithm is indeed effective in performing filtering with reasonable accuracy.
Year
DOI
Venue
2005
10.1007/11546924_9
Lecture Notes in Computer Science
Keywords
Field
DocType
different term,variable length free-text field,nn model,feature set,pre-defined semantics,neural network,junk e-mail,effective e-mail classifying,effective e-mail classification,different e-mail file,higher accuracy,e-mail file,database,filtering,model based reasoning,artificial intelligence,classification,internet,principal component analysis,semantics,preprocessor,dimensionality,free field,modeling
Convergence (routing),Data mining,Computer science,Model-based reasoning,Artificial intelligence,Artificial neural network,Filter (signal processing),Curse of dimensionality,Preprocessor,Plain text,Database,Machine learning,Principal component analysis
Conference
Volume
ISSN
ISBN
3588
0302-9743
3-540-28566-0
Citations 
PageRank 
References 
7
0.69
9
Authors
5
Name
Order
Citations
PageRank
Bin Cui11843124.59
Anirban Mondal238631.29
Jialie Shen3185679.31
gao cong44086169.93
Kian-Lee Tan56962776.65