Title
Learning to Identify Regular Expressions that Describe Email Campaigns
Abstract
This paper addresses the problem of inferring a regular expression from a given set of strings that resembles, as closely as possible, the regular expression that a human expert would have written to identify the language. This is motivated by our goal of automating the task of postmasters of an email service who use regular expressions to describe and blacklist email spam campaigns. Training data contains batches of messages and corresponding regular expressions that an expert postmaster feels confident to blacklist. We model this task as a learning problem with structured output spaces and an appropriate loss function, derive a decoder and the resulting optimization problem, and a report on a case study conducted with an email service.
Year
Venue
DocType
2012
international conference on machine learning
Conference
Volume
Citations 
PageRank 
abs/1206.4637
4
0.52
References 
Authors
13
4
Name
Order
Citations
PageRank
Paul Prasse1133.45
Christoph Sawade2556.21
Niels Landwehr350631.54
Tobias Scheffer41862139.64