Title
Supervised clustering of streaming data for email batch detection
Abstract
We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made --- owing to the streaming nature of the data --- then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.
Year
DOI
Venue
2007
10.1145/1273496.1273540
ICML
Keywords
Field
DocType
collective information,spam emails,supervised clustering,decoding problem,corresponding optimization problem,email batch detection,sequential decoding procedure,collective attribute,email batch,correct clusterings,entire batch,col
Data mining,Sequential decoding,Computer science,Artificial intelligence,Streaming data,Decoding methods,Time complexity,Cluster analysis,Optimization problem,Machine learning
Conference
Citations 
PageRank 
References 
12
0.66
17
Authors
3
Name
Order
Citations
PageRank
Haider, Peter1987.51
Ulf Brefeld263351.89
Tobias Scheffer31862139.64