Title
Unsupervised spam detection based on string alienness measures
Abstract
We propose an unsupervised method for detecting spam documents from a given set of documents, based on equivalence relations on strings. We give three measures for quantifying the alienness (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.
Year
DOI
Venue
2007
10.1007/978-3-540-75488-6_16
Discovery Science
Keywords
Field
DocType
high degree,equivalence class,equivalence relation,unsupervised spam detection,unsupervised method,japanese web forum,computational experiment,string alienness measure,spam document,web pages,col,computer experiment
Data mining,Substring,Equivalence relation,Computer science,Artificial intelligence,Equivalence class,Machine learning,Scalability
Conference
Volume
ISSN
ISBN
4755
0302-9743
3-540-75487-3
Citations 
PageRank 
References 
10
0.61
13
Authors
4
Name
Order
Citations
PageRank
Kazuyuki Narisawa1336.82
Hideo Bannai262079.87
Hatano, Kohei38821.16
Masayuki Takeda490279.24