Title
Enabling effective design of multimodal interfaces for speech-to-speech translation system: An empirical study of longitudinal user behaviors over time and user strategies for coping with errors
Abstract
The study provides an empirical analysis of long-term user behavioral changes and varying user strategies during cross-lingual interaction using the multimodal speech-to-speech (S2S) translation system of USC/SAIL. The goal is to inform user adaptive designs of such systems. A 4-week medical-scenario-based study provides the basis for our analysis. The data analyzed includes user interviews, post-session surveys, and the extensive system logs that were post-processed and annotated. The annotations measured the meaning transfer rates using human evaluations and a scale defined here called the concept matching score. First, qualitative data analysis investigates user strategies in dealing with errors, such as repeat, rephrase, change topic, start over, and the participants' self-reported longitudinal adaptation to errors. Post-session surveys explore participant experience with the system and point to a trend of user-perceived increased performance over time. The log data analysis provides further insightful results. Users chose to allow some degradation (84% of original concepts) of their intended meaning to proceed through the system, even after they observed potential errors in the visual output from the speech recognizer. The rejected utterances, on average, had only 25% of the original concepts. This user-filtered outcome, after the complete channel transfer through the S2S system, is that 91% of the successful turns result in transfer of at least half the intended concepts while 90% of the user rejected turns would have conveyed less than half the intended meaning. The multimodal interface results in 24% relative improvement in the confirmation mode and in 31% relative improvement in the choice mode compared to the speech-only modality. Analysis also showed that users of the multimodal interface temporally change their strategies by accepting more system-produced choices. This user behavior can expedite communication seeking an operating balance between user strategies and system performance factors. Lastly, user utterance length is analyzed. Longer utterances in general imply more information delivered per utterance but potentially at the cost of increased processing degradation. The analysis demonstrates that users reduce their utterance length after unsuccessful turns and increase it after successful turns and that there is a learning effect that increases this behavior over the duration of the study.
Year
DOI
Venue
2013
10.1016/j.csl.2012.02.001
Computer Speech & Language
Keywords
Field
DocType
user adaptive design,user interview,user strategy,user utterance length,multimodal interface,long-term user,longitudinal user behavior,post-session survey,relative improvement,user behavior,original concept,varying user strategy,empirical study,effective design,user interfaces,hci
Computer science,Utterance,Artificial intelligence,User modeling,Natural language processing,Computer user satisfaction,Empirical research,Learning effect,Communication channel,Speech recognition,Speech translation,User interface,Machine learning
Journal
Volume
Issue
ISSN
27
2
0885-2308
Citations 
PageRank 
References 
2
0.40
21
Authors
3
Name
Order
Citations
PageRank
Jongho Shin111711.70
Georgiou Panayiotis242855.79
Shrikanth Narayanan342.55