Title
Complex queries over web repositories
Abstract
Web repositories, such as the Stanford WebBase repository, manage large heterogeneous collections of Web pages and associated indexes. For effective analysis and mining, these repositories must provide a declarative query interface that supports complex expressive Web queries. Such queries have two key characteristics: (i) They view a Web repository simultaneously as a collection of text documents, as a navigable directed graph, and as a set of relational tables storing properties of Web pages (length, URL, title, etc.). (ii) The queries employ application-specific ranking and ordering relationships over pages and links to filter out and retrieve only the "best" query results. In this paper, we model a Web repository in terms of "Web relations" and describe an algebra for expressing complex Web queries. Our algebra extends traditional relational operators as well as graph navigation operators to uniformly handle plain, ranked, and ordered Web relations. In addition, we present an overview of the cost-based optimizer and execution engine that we have developed, to efficiently execute Web queries over large repositories.
Year
Venue
Keywords
2003
VLDB
declarative query interface,graph navigation operator,complex expressive web query,stanford webbase repository,web relation,large repository,complex web query,web query,web repository,web page,complex query,indexation,web pages,directed graph
Field
DocType
ISSN
Web search engine,Static web page,World Wide Web,Web mining,Information retrieval,Web page,Computer science,Web query classification,Data Web,Web modeling,Web navigation,Database
Conference
Proceedings 2003 VLDB Conference
ISBN
Citations 
PageRank 
0-12-722442-4
16
1.42
References 
Authors
12
2
Name
Order
Citations
PageRank
Sriram Raghavan1109697.25
Héctor García-Molina2243595652.13