Title
PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering
Abstract
The volume of spam e-mails has grown rapidly in the last two years resulting in increasing costs to users, network operators, and e-mail service providers (ESPs). E-mail users demand accurate spam filtering with minimum effort from their side. Since the distribution of spam and non-spam e-mails is often different for different users a single filter trained on a general corpus is not optimal for all users. The question asked by ESPs is: How do you build robust and scalable automatic personalized spam filters? We address this question by presenting PSSF, a novel statistical approach for personalized service-side spam filtering. PSSF builds a discriminative classifier from a statistical model of spam and non-spam e-mails. A classifier is first built on a general training corpus that is then adapted in one or more passes of soft labeling and classifier rebuilding over each user's unlabeled e-mails. The statistical model captures the distribution of tokens in spam and non-spam e-mails. This model is robust in the sense that its size can be reduced significantly without degrading filtering performance. We evaluate PSSF on two datasets. The results demonstrate the superior performance and scalability of PSSF in comparison with other published results on the same datasets.
Year
DOI
Venue
2007
10.1109/WI.2007.86
Web Intelligence
Keywords
Field
DocType
classifier rebuilding,scalable automatic personalized spam,novel statistical approach,personalized service-side spam filtering,discriminative classifier,different user,personalized service-side spam,statistical model,spam e-mail,non-spam e-mail,accurate spam,clustering,degradation,scalability,labeling,service provider,filtering,robustness
Data mining,Web mining,Language translation,Computer science,Robustness (computer science),Statistical model,Artificial intelligence,Classifier (linguistics),Cluster analysis,Discriminative model,Machine learning,Scalability
Conference
ISBN
Citations 
PageRank 
0-7695-3026-5
14
0.79
References 
Authors
13
2
Name
Order
Citations
PageRank
Khurum Nazir Junejo1576.08
Asim Karim213618.04