Title
Mining Spoken Dialogue Corpora for System Evaluation and Modeling
Abstract
We are interested in the problem of modeling and evaluating spoken language systems in the context of human-machine dialogs. Spoken di- alog corpora allow for a multidimensional anal- ysis of speech recognition and language under- standing models of dialog systems. Therefore language models can be directly trained based either on the dialog history or its equivalence class (or cluster). In this paper we propose an algorithm to mine dialog traces which exhibit similar patterns and are identied by the same class. For this purpose we apply data clustering methods to large human-machine spoken dia- logue corpora. The resulting clusters can be used for system evaluation and language mod- eling. By clustering dialog traces we expect to learn about the behavior of the system with re- gards to not only the automation rate but the nature of the interaction (e.g. easy vs dicult dialogs). The equivalence classes can also be used in order to automatically adapt the lan- guage model, the understanding module and the dialogue strategy to better t the kind of in- teraction detected. This paper investigates dif- ferent ways for encoding dialogues into multi- dimensional structures and dieren t clustering methods. Preliminary results are given for clus- ter interpretation and dynamic model adapta- tion using the clusters obtained.
Year
Venue
Keywords
2004
EMNLP
language model,speech recognition,data clustering
Field
DocType
Volume
Computer science,System evaluation,Speech recognition,Natural language processing,Artificial intelligence
Conference
W04-32
Citations 
PageRank 
References 
7
0.50
4
Authors
3
Name
Order
Citations
PageRank
Frederic Bechet1141.78
Giuseppe Riccardi21046101.15
Dilek Hakkani-Tür328217.30