Title
Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks.
Abstract
Convolutional neural networks (CNNs) are employed to estimate the visual focus of attention (VFoA), also called gaze direction , in multiparty face-to-face meetings on the basis of multimodal nonverbal behaviors including head pose, direction of the eyeball, and presence/absence of utterance. To reveal the potential of CNNs, we focus on aspects of multimodal and multiparty fusion including individual/group models, early/late fusion, and robustness when using inputs from image-based trackers. In contrast to the individual model that separately targets each person specific to one's seat, the group model aims to jointly estimate the gaze directions of all participants. Experiments confirmed that the group model outperformed the individual model especially in predicting listeners' VFoA when the inputs did not include eyeball directions. This result indicates that the group CNN model can implicitly learn underlying conversation structures, e.g., the listeners' gazes converge on the speaker. When the eyeball direction feature is available, both models outperformed the Bayes models used for comparison. In this case, the individual model was superior to the group model, particularly in estimating the speaker's VFoA. Moreover, it was revealed that in group models, two-stage late fusion, which integrates an individual features first, and multiparty features second, outperformed other structures. Furthermore, our experiment confirmed that image-based tracking can provide a comparable level of performance to that of sensor-based measurements. Overall, the results suggest that the CNN is a promising approach for VFoA estimation.
Year
DOI
Venue
2018
10.1145/3242969.3242973
ICMI
Keywords
Field
DocType
gaze, visual focus of attention, meeting analysis, multimodal fusion, deep learning, convolutional neural networks
Computer vision,BitTorrent tracker,Conversation,Gaze,Convolutional neural network,Computer science,Speech recognition,Nonverbal communication,Robustness (computer science),Artificial intelligence,Deep learning,Bayes' theorem
Conference
ISBN
Citations 
PageRank 
978-1-4503-5692-3
0
0.34
References 
Authors
30
3
Name
Order
Citations
PageRank
Kazuhiro Otsuka161954.15
Keisuke Kasuga200.34
Martina Köhler300.34