Title
An Approach to Image Spam Filtering Based on Base64 Encoding and N-Gram Feature Extraction
Abstract
As compared with text spam, the image spam is a variant which is invented to escape from traditional text-based spam classification and filtering. Various approaches to image spam filtering have been proposed with respective advantages and drawbacks in terms of time cost and efficiency. In this paper, we propose a new approach based on Base64 encoding of image files and n-gram technique for feature extraction. By transforming normal images into Base64 presentation, we try to extract features of an image with n-gram technique. With these features we train an SVM (support vector machine) which shows effectiveness and efficiency in detecting spam images from legitimate images. With an online shared personal corpus of images as the input, experimental results show that our approach, in comparison with some of the existing methods of feature extraction, can achieve very high performance for image spam classification in terms of some basic measures such as accuracy, precision, and recall. Moreover, our approach shows its practicability by taking less running time for image spam classification in comparison to other methods.
Year
DOI
Venue
2010
10.1109/ICTAI.2010.31
ICTAI (1)
Keywords
Field
DocType
text-based spam filtering,image coding,spam image,image file,n-gram feature extraction,base64 encoding,text spam,image spam,gram technique,image spam filtering,traditional text-based spam classification,unsolicited e-mail,svm,text-based spam classification,feature extraction,image classification,support vector machine,image spam classification,legitimate image,normal image,support vector machines
Bag-of-words model,Data mining,Pattern recognition,Computer science,Support vector machine,Filter (signal processing),Feature extraction,Image spam,Image file formats,Artificial intelligence,n-gram,Contextual image classification
Conference
Volume
ISSN
ISBN
1
1082-3409
978-1-4244-8817-9
Citations 
PageRank 
References 
0
0.34
10
Authors
3
Name
Order
Citations
PageRank
Congfu Xu113115.71
Yafang Chen200.34
Kevin Chiew3174.32