Title
Resolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm
Abstract
A well-known approach for collaborative spam flltering is to determine which emails belong to the same bulk, e.g. by exploiting their content similarity. This allows, after ob- serving an initial portion of a bulk, for the bulkiness scores to be assigned to the remaining emails from the same bulk. This also allows the individual evidence of spamminess to be joined, if such evidence is generated by collaborating fllters or users for some of the emails from an initial portion of the bulk. Usually a database of previously observed emails or email digests is formed and queried upon receiving new emails. Previous evaluations (2, 10) of the approach based on the email digests that preserve email content similarity indicate and partially demonstrate that there are ways to make the approach robust to increased obfuscation efiorts by spam- mers. However, for the settings of the parameters that pro- vide good matching between the emails from the same bulk, the unwanted random matching between ham emails and unrelated ham and spam emails stays rather high. This directly translates into a need for use of higher bulkiness thresholds in order to ensure low false positive (FP) detec- tion of ham, which implies that larger initial parts of spam bulks will not be flltered, i.e. true positive (TP) detection will not be very high (FP-TP con∞ict). In this paper we demonstrate how, by use of the neg- ative selection algorithm, the unwanted random matching between unrelated emails may be decreased at least by an order of magnitude, while preserving the same good match- ing between the emails from the same bulk. We also show how this translates into an order of magnitude (at least) of less undetected bulky spam emails, under the same ham miss-detection requirements.
Year
Venue
Keywords
2008
CEAS
open digest,detection,collaborative,data repre- sentation,ro- bustness,email,negative selection algorithm.,similarity hashing,obfuscation,spam,flltering,false positive,negative selection,robustness,filtering,data representation
Field
DocType
Citations 
Data mining,World Wide Web,Internet privacy,External Data Representation,Computer science,Filter (signal processing),Robustness (computer science),Negative selection algorithm,Obfuscation
Conference
2
PageRank 
References 
Authors
0.37
4
3
Name
Order
Citations
PageRank
Slavisa Sarafijanovic1958.20
Sabrina Perez220.37
Jean-Yves Le Boudec35075471.48