The Impact of Feature Selection on Signature-Driven Spam Detection - Citegraph

Paper Info

Title
The Impact of Feature Selection on Signature-Driven Spam Detection

Abstract
Signature-driven spam detection provides an alternative to machine learning approaches and can be very effective when near-duplicates of essentially the same message are sent in high vol- ume (20). Unfortunately, signatures can also be brittle to small alterations of message content. In this work we propose a technique for increasing signature robustness, targeting the I-Match algorithm (6), but applicable to other single-signature detection schemes. The proposed method is shown to consis- tently outperform traditional I-Match in the spam filtering application. As I-Match signature quality and stability depend on vocabulary control, we compare the traditional Zipfian approaches to feature selection with techniques applied typically in text categorization, which are found to provide viable alternatives. In particular, distributional word clustering is demonstrated to be effective in increasing signature robustness.

Year	Venue	Keywords
2004	CEAS	feature selection,machine learning
Field	DocType	Citations
Bag-of-words model,Data mining,World Wide Web,Feature selection,Pattern recognition,Computer science,Filter (signal processing),Robustness (computer science),Artificial intelligence,Text categorization,Cluster analysis,Vocabulary	Conference	23
PageRank	References	Authors
1.39	12	3

Authors (3 rows)

Cited by (23 rows)

References (12 rows)

Name	Order	Citations	PageRank
Aleksander Kołcz	1	628	66.65
Abdur Chowdhury	2	2013	160.59
Joshua Alspector	3	445	267.78

1