Title
An Analysis Of Time-Aggregated And Time-Series Features For Scoring Different Aspects Of Multimodal Presentation Data
Abstract
We present a technique for automated assessment of public speaking and presentation proficiency based on the analysis of concurrently recorded speech and motion capture data. With respect to Kinect motion capture data, we examine both time aggregated as well as time-series based features. While the former is based on statistical functionals of body-part position and/or velocity computed over the entire series, the latter feature set, dubbed histograms of cooccurrences, captures how often different broad postural configurations co-occur within different time lags of each other over the evolution of the multimodal time series. We examine the relative utility of these features, along with curated features derived from the speech stream, in predicting human-rated scores of different aspects of public speaking and presentation proficiency. We further show that these features outperform the human inter-rater agreement baseline for a subset of the analyzed aspects.
Year
Venue
Keywords
2015
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
speech recognition, human-computer interaction, computational paralinguistics, multimodal computing
Field
DocType
Citations 
Motion capture,Histogram,Pattern recognition,Computer science,Speech recognition,Feature set,Public speaking,Artificial intelligence
Conference
0
PageRank 
References 
Authors
0.34
10
5
Name
Order
Citations
PageRank
Vikram Ramanarayanan17013.97
Lei Chen2847.63
Chee Wee Leong315315.10
Gary Feng4503.73
David Suendermann-Oeft532.17