Title
Automatic Personalized Spam Filtering through Significant Word Modeling
Abstract
Typically, spam filters are built on the assumption that the characteristics of e-mails in the training set is identical to those in individual users' inboxes on which it will be applied. This assumption is oftentimes incorrect leading to poor performance of the filter. A personalized spam filter is built by taking into account the characteristics of e-mails in individual users' inboxes. We present an automatic approach for personalized spam filtering that does not require users' feedback. The proposed algorithm builds a statistical model of significant spam and non-spam words from the labeled training set and then updates it in multiple passes over the unlabeled individual user's inbox. The personalization of the model leads to improved filtering performance. We evaluate our algorithm on two publicly available datasets. The results show that our algorithm is robust and scalable, and a viable solution to the server-side personalized spam filtering problem. Moreover, it outperforms published results on one dataset and its performance is equivalent to the others on the second dataset.
Year
DOI
Venue
2007
10.1109/ICTAI.2007.66
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference
Keywords
Field
DocType
information filtering,statistical analysis,unsolicited e-mail,automatic personalized spam filtering,available datasets,e-mails,labeled training set,statistical model,training set,word modeling
Training set,Bag-of-words model,Data mining,Computer science,Filter (signal processing),Filtering problem,Artificial intelligence,Statistical model,Machine learning,Personalization,Statistical analysis,Scalability
Conference
Volume
ISSN
ISBN
2
1082-3409
978-0-7695-3015-4
Citations 
PageRank 
References 
5
0.46
21
Authors
2
Name
Order
Citations
PageRank
Khurum Nazir Junejo1576.08
Asim Karim213618.04