Learning to create data-integrating queries - Citegraph

Paper Info

Title
Learning to create data-integrating queries

Abstract
The number of potentially-related data resources available for querying --- databases, data warehouses, virtual integrated schemas --- continues to grow rapidly. Perhaps no area has seen this problem as acutely as the life sciences, where hundreds of large, complex, interlinked data resources are available on fields like proteomics, genomics, disease studies, and pharmacology. The schemas of individual databases are often large on their own, but users also need to pose queries across multiple sources, exploiting foreign keys and schema mappings. Since the users are not experts, they typically rely on the existence of pre-defined Web forms and associated query templates, developed by programmers to meet the particular scientists' needs. Unfortunately, such forms are scarce commodities, often limited to a single database, and mismatched with biologists' information needs that are often context-sensitive and span multiple databases. We present a system with which a non-expert user can author new query templates and Web forms, to be reused by anyone with related information needs. The user poses keyword queries that are matched against source relations and their attributes; the system uses sequences of associations (e.g., foreign keys, links, schema mappings, synonyms, and taxonomies) to create multiple ranked queries linking the matches to keywords; the set of queries is attached to a Web query form. Now the user and his or her associates may pose specific queries by filling in parameters in the form. Importantly, the answers to this query are ranked and annotated with data provenance, and the user provides feedback on the utility of the answers, from which the system ultimately learns to assign costs to sources and associations according to the user's specific information need, as a result changing the ranking of the queries used to generate results. We evaluate the effectiveness of our method against "gold standard" costs from domain experts and demonstrate the method's scalability.

Year	DOI	Venue
2008	10.14778/1453856.1453941	PVLDB
Keywords	Field	DocType
author new query template,data warehouse,web query form,foreign key,data provenance,non-expert user,interlinked data resource,data-integrating query,potentially-related data resource,schema mapping,keyword query,information need,data integrity,gold standard	Data warehouse,Web search query,Data mining,Information needs,Information retrieval,Ranking,Computer science,Foreign key,Specific-information,Schema (psychology),Database,Scalability	Journal
Volume	Issue	ISSN
1	1	2150-8097
Citations	PageRank	References
41	4.27	34
Authors
7

Authors (7 rows)

Cited by (41 rows)

References (34 rows)

Name	Order	Citations	PageRank
Partha Pratim Talukdar	1	980	65.47
Marie Jacob	2	153	10.20
Muhammad Salman Mehmood	3	41	4.27
Koby Crammer	4	5252	466.86
Zachary G. Ives	5	3869	318.82
Fernando Pereira	6	17717	2124.79
Sudipto Guha	7	3006	292.44

1