Title
Implementation of a high-speed and high-precision XML information retrieval system on relational databases
Abstract
This paper describes an XML information retrieval system that we have developed. It is based on a vector space model, and implemented on top of XRel, a relational XML database system that has been developed in our research group. When a query is processed, a large number of fragments are retrieved, because a single XML document usually contains many XML fragments. Keeping all XML fragments degrades retrieval precision and increases query processing time, because some XML fragments are not appropriate as a query target. In existing methods, retrieval targets are manually selected by human experts when an XML collection is stored in the system. Such manual selection is not feasible when many kinds of XML documents are stored in the system. To cope with the problem we propose a method for automatically selecting document-centric fragments by introducing three measurements, namely, period ratio, number of different words, and empirical rules. By deleting inappropriate data-centric fragments from results of keyword query, we can improve the accuracy and performance of our system. Through performance evaluations, we confirmed the improvement of retrieval precision and query processing speed.
Year
DOI
Venue
2005
10.1007/978-3-540-34963-1_19
INEX
Keywords
Field
DocType
xml document,vector space model,xml database,relational database
Data mining,XML Encryption,Efficient XML Interchange,Streaming XML,Information retrieval,Computer science,XML validation,Document Structure Description,XML database,XML schema,XML Schema Editor
Conference
Volume
ISSN
ISBN
3977
0302-9743
3-540-34962-6
Citations 
PageRank 
References 
2
0.45
5
Authors
8
Name
Order
Citations
PageRank
Kei Fujimoto120.45
Toshiyuki Shimizu219129.19
Norimasa Terada361.21
Kenji Hatano43010.41
Yu Suzuki520.45
Toshiyuki Amagasa642178.46
Hiroko Kinutani7254.14
Masatoshi Yoshikawa81655282.19