Abstract | ||
---|---|---|
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N-gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N-gram language model structure. We also use 'variable length coding' and 'block-wise compression' to compress values associated with nodes. Experimental results for three large-scale N-gram compression tasks achieved a significant compression rate without any loss. |
Year | Venue | Keywords |
---|---|---|
2009 | ACL/IJCNLP (Short Papers) | significant compression rate,lossless compression,succinct n-gram language model,n-gram language model structure,tera-scale text data,louds succinctly,block-wise compression,n-gram language model,large-scale n-gram compression task,succinct data structure,m node,language model |
Field | DocType | Volume |
Data compression ratio,Succinct data structure,Lossy compression,Computer science,Theoretical computer science,n-gram,Data compression,Trie,Bit array,Lossless compression | Conference | P09-2 |
Citations | PageRank | References |
8 | 0.48 | 9 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Taro Watanabe | 1 | 572 | 36.86 |
Hajime Tsukada | 2 | 449 | 29.46 |
Hideki Isozaki | 3 | 934 | 64.50 |