Abstract | ||
---|---|---|
We describe a compression model called tri-structural contexts model (TSCM), for semi-structured documents. The intention is that separation of the start tag, the attribute name/attribute value and textual words may reduce the entropy. We also combine the attributes with their values and use a separate container for them. We mainly focus on semi-static models, and test our idea using a word-based tagged code. This code allows random access and partial decompression of the compressed collection. The compression time is found to be better than scmhuff and decompression time is also observed much less than scmhuff and xmlppm. The shorter time for partial decompression emphasises the use of TSC model to keep the semi-structured document compressed all the time. The algorithm and proposed model are useful in information retrieval systems. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1504/IJCAT.2010.034524 | IJCAT |
Keywords | Field | DocType |
compression model,tri-structural contexts model,compression time,tsc model,partial decompression,partial retrieval,semi-static model,decompression time,semi-structured document,shorter time,information retrieval | Information system,Data mining,Static model,Decompression,Text compression,Static testing,Computer science,Data compression,Random access | Journal |
Volume | Issue | ISSN |
38 | 4 | 0952-8091 |
Citations | PageRank | References |
2 | 0.36 | 19 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ashutosh Gupta | 1 | 192 | 14.01 |
Suneeta Agarwal | 2 | 174 | 26.32 |