Title
Robust personalizable spam filtering via local and global discrimination modeling.
Abstract
Content-based e-mail spam filtering continues to be a challenging machine learning problem. Usually, the joint distribution of e-mails and labels changes from user to user and from time to time, and the training data are poor representatives of the true distribution. E-mail service providers have two options for automatic spam filtering at the service-side: a single global filter for all users or a personalized filter for each user. The practical usefulness of these options, however, depends upon the robustness and scalability of the filter. In this paper, we address these challenges by presenting a robust personalizable spam filter based on local and global discrimination modeling. Our filter exploits highly discriminating content terms, identified by their relative risk, to transform the input space into a two-dimensional feature space. This transformation is obtained by linearly pooling the discrimination information provided by each term for spam or non-spam classification. Following this local model, a linear discriminant is learned in the feature space for classification. We also present a strategy for personalizing the local and global models using unlabeled e-mails, without requiring user’s feedback. Experimental evaluations and comparisons are presented for global and personalized spam filtering, for varying distribution shift, for handling the problem of gray e-mails, on unseen e-mails, and with varying filter size. The results demonstrate the robustness and effectiveness of our filter and its suitability for global and personalized spam filtering at the service-side.
Year
DOI
Venue
2013
10.1007/s10115-012-0477-x
Knowl. Inf. Syst.
Keywords
Field
DocType
E-mail classification, Local/global models, Distribution shift, Personalization, Gray e-mail
Data mining,Feature vector,Joint probability distribution,Computer science,Pooling,Filter (signal processing),Robustness (computer science),Artificial intelligence,Linear discriminant analysis,Machine learning,Personalization,Scalability
Journal
Volume
Issue
ISSN
34
2
0219-3116
Citations 
PageRank 
References 
3
0.38
48
Authors
2
Name
Order
Citations
PageRank
Khurum Nazir Junejo1576.08
Asim Karim213618.04