The Use Of Word N-Grams And Parts Of Speech For Hierarchical Cluster Language Modeling - Citegraph

Paper Info

Title
The Use Of Word N-Grams And Parts Of Speech For Hierarchical Cluster Language Modeling

Abstract
We present extensions to the work of backoff hierarchical class n-gram language modeling of Zitouni et al. [1] by studying the efficacy of exploring the use of parts of speech (POS) information in hierarchical word clustering. We propose two approaches. One is to use POS n-gram contextual distributions of a target word for clustering. The other is to generate a class tree for each group of words sharing the same POS. The resulting class tree and a set of class trees, from the two approaches, respectively, are then employed in the hierarchical cluster language modeling. We evaluate the two approaches on SRI Arabic conversational telephone speech recognition system and show that the approach of building a set of POS-specific class trees achieves a 3% relative improvement on perplexity compared to the model of Zitouni et al. and a 8% relative improvement on perplexity over the baseline standard word n-grams. When used for N-best rescoring, our approach also outperforms the model of Zitouni et al. and the baseline and achieves significant word error rate (WER) reductions.

Year	DOI	Venue
2006	10.1109/ICASSP.2006.1660206	2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13
Keywords	Field	DocType
word error rate,automatic speech recognition,telephony,speech processing,part of speech,speech recognition,training data,hierarchical clustering,language model,statistics,parts of speech,probability,natural languages	Speech processing,Perplexity,Computer science,Part of speech,Natural language processing,Artificial intelligence,Cluster analysis,Language model,Hierarchical clustering,Pattern recognition,Word error rate,Speech recognition,Natural language	Conference
ISSN	Citations	PageRank
1520-6149	6	0.59
References	Authors
3	2

Authors (2 rows)

Cited by (6 rows)

References (3 rows)

Name	Order	Citations	PageRank
Wen Wang	1	327	29.31
Dimitra Vergyri	2	373	36.97

1