Abstract | ||
---|---|---|
Content-based e-mail spam filtering continues to be a challenging machine learning problem. Usually, the joint distribution of e-mails and labels changes from user to user and from time to time, and the training data are poor representatives of the true distribution. E-mail service providers have two options for automatic spam filtering at the service-side: a single global filter for all users or a personalized filter for each user. The practical usefulness of these options, however, depends upon the robustness and scalability of the filter. In this paper, we address these challenges by presenting a robust personalizable spam filter based on local and global discrimination modeling. Our filter exploits highly discriminating content terms, identified by their relative risk, to transform the input space into a two-dimensional feature space. This transformation is obtained by linearly pooling the discrimination information provided by each term for spam or non-spam classification. Following this local model, a linear discriminant is learned in the feature space for classification. We also present a strategy for personalizing the local and global models using unlabeled e-mails, without requiring user’s feedback. Experimental evaluations and comparisons are presented for global and personalized spam filtering, for varying distribution shift, for handling the problem of gray e-mails, on unseen e-mails, and with varying filter size. The results demonstrate the robustness and effectiveness of our filter and its suitability for global and personalized spam filtering at the service-side. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/s10115-012-0477-x | Knowl. Inf. Syst. |
Keywords | Field | DocType |
E-mail classification, Local/global models, Distribution shift, Personalization, Gray e-mail | Data mining,Feature vector,Joint probability distribution,Computer science,Pooling,Filter (signal processing),Robustness (computer science),Artificial intelligence,Linear discriminant analysis,Machine learning,Personalization,Scalability | Journal |
Volume | Issue | ISSN |
34 | 2 | 0219-3116 |
Citations | PageRank | References |
3 | 0.38 | 48 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Khurum Nazir Junejo | 1 | 57 | 6.08 |
Asim Karim | 2 | 136 | 18.04 |