Title
Using biased discriminant analysis for email filtering
Abstract
This paper reports on email filtering based on content features. We test the validity of a novel statistical feature extraction method, which relies on dimensionality reduction to retain the most informative and discriminative features from messages. The approach, named Biased Discriminant Analysis (BDA), aims at finding a feature space transformation that closely clusters positive examples while pushing away the negative ones. This method is an extension of Linear Discriminant Analysis (LDA), but introduces a different transformation to improve the separation between classes and it has up till now not been applied for text mining tasks. We successfully test BDA under two schemas. The first one is a traditional classification scenario using a 10-fold cross validation for four ground truth standard corpora: LingSpam, SpamAssassin, Phishing corpus and a subset of the TREC 2007 spam corpus. In the second schema we test the anticipatory properties of the statistical features with the TREC 2007 spam corpus. The contributions of this work is the evidence that BDA offers better discriminative features for email filtering, gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time.
Year
DOI
Venue
2010
10.1007/978-3-642-15387-7_60
KES (1)
Keywords
Field
DocType
discriminative feature,linear discriminant analysis,discriminant analysis,biased discriminant analysis,spam corpus,different transformation,feature space transformation,discriminative value,content feature,phishing corpus,ground truth standard corpus,feature space,ground truth,text mining,feature extraction,cross validation
Feature vector,Email filtering,Dimensionality reduction,Pattern recognition,Computer science,Feature extraction,Artificial intelligence,Linear discriminant analysis,Latent semantic analysis,Cross-validation,Discriminative model
Conference
Volume
ISSN
ISBN
6276
0302-9743
3-642-15386-0
Citations 
PageRank 
References 
3
0.38
14
Authors
2
Name
Order
Citations
PageRank
Juan Carlos Gomez18412.89
Marie-Francine Moens21750139.27