Title
An incremental cluster-based approach to spam filtering
Abstract
As email becomes a popular means for communication over the Internet, the problem of receiving unsolicited and undesired emails, called spam or junk mails, severely arises. To filter spam from legitimate emails, automatic classification approaches using text mining techniques are proposed. This kind of approaches, however, often suffers from low recall rate due to the natures of spam, skewed class distributions and concept drift. This research is thus to propose an appropriate classification approach to alleviating the problems of skewed class distributions and drifting concepts. A cluster-based classification method, called ICBC, is developed accordingly. ICBC contains two phases. In the first phase, it clusters emails in each given class into several groups, and an equal number of features (keywords) are extracted from each group to manifest the features in the minority class. In the second phase, we capacitate ICBC with an incremental learning mechanism that can adapt itself to accommodate the changes of the environment in a fast and low-cost manner. Three experiments are conducted to evaluate the performance of ICBC. The results show that ICBC can effectively deal with the issues of skewed and changing class distributions, and its incremental learning can also reduce the cost of re-training. The feasibility of the proposed approach is thus justified.
Year
DOI
Venue
2008
10.1016/j.eswa.2007.01.018
Expert Syst. Appl.
Keywords
Field
DocType
skewed class distribution,incremental cluster-based approach,class distribution,email classification,concept drift,minority class,appropriate classification approach,incremental learning,legitimate emails,clusters emails,cluster-based classification method,undesired emails,automatic classification,text mining
Data mining,Recall rate,Computer science,Incremental learning,Email classification,Filter (signal processing),Concept drift,Artificial intelligence,Machine learning,The Internet
Journal
Volume
Issue
ISSN
34
3
Expert Systems With Applications
Citations 
PageRank 
References 
23
1.03
13
Authors
2
Name
Order
Citations
PageRank
Wen-Feng Hsiao1444.94
Te-Min Chang2346.29