Title
Information Retrieval System for XML Documents
Abstract
In the research field of document information retrieval, the unit of retrieval results returned by IR systems is a whole document or a document fragment, like a paragraph in passage retrieval. IR systems based on the vector space model compute feature vectors of the units and calculate the similarities between the units and the query. However, the unit of retrieval results are not suitable for document information retrieval since they are not congruent with the information which users are searching for. Therefore, the unit of retrieval results should be a portion of the XML document, such as a chapter, section, or subsection. That is, we think the most important concern of document information retrieval is to define the unit of retrieval results, that is meaningful for users. It is easy to construct the appropriate portion of XML documents as retrieval results because XML is a standard document format on the Internet and because XML documents consist of contents and document structures. In this paper, we propose an effective IR system for XML documents that automatically defines an appropriate unit of retrieval results by analyzing the XML document structure. We performed experimental evaluations and verified the effectiveness of our XML IR system. In addition, we also defined new recall and precision measures for XML information retrieval in order to evaluate our XML IR system.
Year
DOI
Venue
2002
10.1007/3-540-46146-9_75
DEXA
Keywords
Field
DocType
xml document structure,passage retrieval,retrieval result,xml ir system,document information retrieval,information retrieval system,xml information retrieval,document fragment,xml document,xml documents,ir system,document structure,feature vector,information retrieval,vector space model
XML Encryption,Streaming XML,Information retrieval,Well-formed document,XML validation,Computer science,Document Structure Description,XML schema,Simple API for XML,Database,XML Catalog
Conference
ISBN
Citations 
PageRank 
3-540-44126-3
6
0.67
References 
Authors
9
4
Name
Order
Citations
PageRank
Kenji Hatano13010.41
Hiroko Kinutani2254.14
Masatoshi Yoshikawa31655282.19
Shunsuke Uemura471478.79