Title
An algebra for hierarchically organized text-dominated databases
Abstract
Structured documents are usually comprised of nested text elements; for example, reports contain chapters, chapters contain sections, …, sentences contain words. The containment relationships of these text elements define a text hierarchy that can be exploited during search activities such as database browsing and full-text retrieval. During a database load the system typically constructs concordance lists, each list maintaining the locations of all occurrences of a particular type of text element. Although not necessarily constructed in practice, a complete set of concordance lists would constitute an equivalent representation of the database, namely its inverted form. This paper describes an algebra based on various primitive operators that use concordance lists as operands. These primitives can be used to define higher level filter operators that specify whether a contiguous text extent will be selected or rejected during a search. The main contribution of the paper is the presentation of this algebra as a theoretical model that can be used to define a conceptual schema for the database. This theoretical model provides both a mathematically well defined abstraction for the database and a basis for database implementation since it may be utilized to formally define the search protocols between the database query facilities and the underlying retrieval engine.
Year
DOI
Venue
1992
10.1016/0306-4573(92)90079-F
Inf. Process. Manage.
Keywords
Field
DocType
hierarchically organized text-dominated databases,database design,algebra,information retrieval
Data mining,Abstraction,Computer science,View,Database schema,Operator (computer programming),Hierarchy,Conceptual schema,Information retrieval,Algebra,Operand,Database design,Database
Journal
Volume
Issue
ISSN
28
3
Information Processing and Management
Citations 
PageRank 
References 
28
18.96
6
Authors
1
Name
Order
Citations
PageRank
F. J. Burkowski125588.69