The IBM rich transcription spring 2006 speech-to-text system for lecture meetings - Citegraph

Paper Info

Title
The IBM rich transcription spring 2006 speech-to-text system for lecture meetings

Abstract
We describe the IBM systems submitted to the NIST RT06s Speech-to-Text (STT) evaluation campaign on the CHIL lecture meeting data for three conditions: Multiple distant microphone (MDM), single distant microphone (SDM), and individual headset microphone (IHM). The system building process is similar to the IBM conversational telephone speech recognition system. However, the best models for the far-field conditions (SDM and MDM) proved to be the ones that use neither variance normalization nor vocal tract length normalization. Instead, feature-space minimum-phone error discriminative training yielded the best results. Due to the relatively small amount of CHIL-domain data, the acoustic models of our systems are built on publicly available meeting corpora, with maximum a-posteriori adaptation applied twice on CHIL data during training: First, at the initial speaker-independent model, and subsequently at the minimum phone error model. For language modeling, we utilized meeting transcripts, text from scientific conference proceedings, and spontaneous telephone conversations. On development data, chosen in our work to be the 2005 CHIL-internal STT evaluation test set, the resulting language model provided a 4% absolute gain in word error rate (WER), compared to the model used in last year's CHIL evaluation. Furthermore, the developed STT system significantly outperformed our last year's results, by reducing close-talking microphone data WER from 36.9% to 25.4% on our development set. In the NIST RT06s evaluation campaign, both MDM and SDM systems scored well, however the IHM system did poorly due to unsuccessful cross-talk removal.

Year	DOI	Venue
2006	10.1007/11965152_38	MLMI
Keywords	Field	DocType
chil evaluation,chil lecture meeting data,last year,ibm rich transcription spring,ibm system,chil data,speech-to-text system,ihm system,chil-domain data,chil-internal stt evaluation test,development data,close-talking microphone data,language model,speech to text,speech recognition,word error rate,feature space	Headset,Data modeling,Speech synthesis,Computer science,Word error rate,Speech recognition,NIST,Language model,Microphone,Acoustic model	Conference
Volume	ISSN	ISBN
4299	0302-9743	3-540-69267-3
Citations	PageRank	References
4	0.46	9
Authors
10

Authors (10 rows)

Cited by (4 rows)

References (9 rows)

Name	Order	Citations	PageRank
Jing Huang	1	2464	186.09
Martin Westphal	2	4	0.46
Stanley F. Chen	3	1723	219.64
Olivier Siohan	4	392	34.33
Daniel Povey	5	2442	231.75
Vit Libal	6	32	4.32
Alvaro Soneiro	7	4	0.46
Henrik Schulz	8	39	5.76
Thomas Ross	9	4	0.46
Gerasimos Potamianos	10	1113	113.80

1