Title
Partial retrieval of compressed semi-structured documents
Abstract
We describe a compression model called tri-structural contexts model (TSCM), for semi-structured documents. The intention is that separation of the start tag, the attribute name/attribute value and textual words may reduce the entropy. We also combine the attributes with their values and use a separate container for them. We mainly focus on semi-static models, and test our idea using a word-based tagged code. This code allows random access and partial decompression of the compressed collection. The compression time is found to be better than scmhuff and decompression time is also observed much less than scmhuff and xmlppm. The shorter time for partial decompression emphasises the use of TSC model to keep the semi-structured document compressed all the time. The algorithm and proposed model are useful in information retrieval systems.
Year
DOI
Venue
2010
10.1504/IJCAT.2010.034524
IJCAT
Keywords
Field
DocType
compression model,tri-structural contexts model,compression time,tsc model,partial decompression,partial retrieval,semi-static model,decompression time,semi-structured document,shorter time,information retrieval
Information system,Data mining,Static model,Decompression,Text compression,Static testing,Computer science,Data compression,Random access
Journal
Volume
Issue
ISSN
38
4
0952-8091
Citations 
PageRank 
References 
2
0.36
19
Authors
2
Name
Order
Citations
PageRank
Ashutosh Gupta119214.01
Suneeta Agarwal217426.32