Speech and Language Processing for Multimodal Human-Computer Interaction - Citegraph

Paper Info

Title
Speech and Language Processing for Multimodal Human-Computer Interaction

Abstract
In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design&semi; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing&semi; Aurora2 evaluation results for this front-end processing&semi; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad&semi; HMM-based acoustic modeling for continuous speech recognition decoding&semi; a unified language model integrating context-free grammar and N-gram model for the speech decoding&semi; schema-based knowledge representation for the MiPad's personal information management task&semi; a unified statistical framework that integrates speech recognition, spoken language understanding and dialogue management&semi; the robust natural language parser used in MiPad to process the speech recognizer's output&semi; a machine-aided grammar learning and development used for spoken language understanding for the MiPad task&semi; Tap & Talk multimodal interaction and user interface design&semi; back channel communication and MiPad's error repair strategy&semi; and finally, user study results that demonstrate the superior throughput achieved by the Tap & Talk multimodal interaction over the existing pen-only PDA interface. These user study results highlight the crucial role played by speech in enhancing the overall user experience in MiPad-like human-computer interaction devices.

Year	DOI	Venue
2004	10.1023/B:VLSI.0000015095.19623.73	Journal of Signal Processing Systems
Keywords	Field	DocType
speech-centric multimodal interface,human-computer interaction,robust speech recognition,SPLICE algorithm,denoising,online noise estimation,distributed speech processing,speech feature encoding,error protection,spoken language understanding,automatic grammar learning,semantic schema,user study	Speech corpus,Speech processing,Multimodal interaction,Speech analytics,Computer science,Speech recognition,Human–computer interaction,User interface,Speech technology,Spoken language,Language model	Journal
Volume	Issue	ISSN
36	2/3	0922-5773
Citations	PageRank	References
4	0.42	10
Authors
9

Authors (9 rows)

Cited by (4 rows)

References (10 rows)

Name	Order	Citations	PageRank
Deng, Li	1	9691	728.14
yeyi wang	2	4	0.42
Kuansan Wang	3	1310	95.70
A. Acero	4	75	11.56
Hsiao-Wuen Hon	5	1719	354.37
james g droppo	6	4	0.42
C. Boulis	7	42	3.67
milind mahajan	8	34	4.13
Xuedong Huang	9	1390	283.19

1