Exploiting site-level information to improve web search - Citegraph

Paper Info

Title
Exploiting site-level information to improve web search

Abstract
Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each Web page out of context. In this work, we introduce a novel source of relevance information for Web search by evaluating each page in the context of its host Web site. For this purpose, we devise two strategies for compactly representing entire Web sites. We formalize our approach by building two indices, a traditional page index and a new site index, where each "document" represents the an entire Web site. At runtime, a query is first executed against both indices, and then the final page score for a given query is produced by combining the scores of the page and its site. Experimental results carried out on a large-scale Web search test collection from a major commercial search engine confirm the proposed approach leads to consistent and significant improvements in retrieval effectiveness.

Year	DOI	Venue
2010	10.1145/1871437.1871630	CIKM
Keywords	Field	DocType
bag of words,web pages,search engine,machine learning,algorithms	Web search engine,Static web page,Data mining,Web search query,Information retrieval,Web page,Computer science,Web query classification,Web modeling,Backlink,Web crawler	Conference
Citations	PageRank	References
11	0.63	14
Authors
6

Authors (6 rows)

Cited by (11 rows)

References (14 rows)

Name	Order	Citations	PageRank
Andrei Broder	1	7357	920.20
Evgeniy Gabrilovich	2	4573	224.48
Vanja Josifovski	3	2265	148.84
George Mavromatis	4	18	1.29
Donald Metzler	5	3138	141.39
Jane Wang	6	11	0.63

1