Title
A succinct N-gram language model
Abstract
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N-gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N-gram language model structure. We also use 'variable length coding' and 'block-wise compression' to compress values associated with nodes. Experimental results for three large-scale N-gram compression tasks achieved a significant compression rate without any loss.
Year
Venue
Keywords
2009
ACL/IJCNLP (Short Papers)
significant compression rate,lossless compression,succinct n-gram language model,n-gram language model structure,tera-scale text data,louds succinctly,block-wise compression,n-gram language model,large-scale n-gram compression task,succinct data structure,m node,language model
Field
DocType
Volume
Data compression ratio,Succinct data structure,Lossy compression,Computer science,Theoretical computer science,n-gram,Data compression,Trie,Bit array,Lossless compression
Conference
P09-2
Citations 
PageRank 
References 
8
0.48
9
Authors
3
Name
Order
Citations
PageRank
Taro Watanabe157236.86
Hajime Tsukada244929.46
Hideki Isozaki393464.50