Abstract | ||
---|---|---|
Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human ex- perts can do a good job of identifying spam pages and pages whose information is of dubious quality, but it is practically infeasible to use human effort for a large number of pages. Similar to the approach in (1), we propose a method of se- lecting a seed set of pages to be evaluated by a human. We then use the link structure of the web and the manually labeled seed set, to detect other spam pages. Our experi- ments on the WebGraph dataset (3) show that our approach is very effective at detecting spam pages from a small seed set and achieves higher precision of spam page detection than the Trust Rank algorithm, apart from detecting pages with higher pageranks, on an average. |
Year | Venue | Keywords |
---|---|---|
2006 | AIRWeb | web spam,search engine |
Field | DocType | Citations |
Data mining,Search engine,Information retrieval,Webgraph,TrustRank,Computer science,Spamdexing | Conference | 65 |
PageRank | References | Authors |
2.32 | 2 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vijay Krishnan | 1 | 193 | 11.34 |
Rashmi Raj | 2 | 68 | 2.73 |