Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks. - Citegraph

Paper Info

Title
Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks.

Abstract
Convolutional neural networks (CNNs) are employed to estimate the visual focus of attention (VFoA), also called gaze direction , in multiparty face-to-face meetings on the basis of multimodal nonverbal behaviors including head pose, direction of the eyeball, and presence/absence of utterance. To reveal the potential of CNNs, we focus on aspects of multimodal and multiparty fusion including individual/group models, early/late fusion, and robustness when using inputs from image-based trackers. In contrast to the individual model that separately targets each person specific to one's seat, the group model aims to jointly estimate the gaze directions of all participants. Experiments confirmed that the group model outperformed the individual model especially in predicting listeners' VFoA when the inputs did not include eyeball directions. This result indicates that the group CNN model can implicitly learn underlying conversation structures, e.g., the listeners' gazes converge on the speaker. When the eyeball direction feature is available, both models outperformed the Bayes models used for comparison. In this case, the individual model was superior to the group model, particularly in estimating the speaker's VFoA. Moreover, it was revealed that in group models, two-stage late fusion, which integrates an individual features first, and multiparty features second, outperformed other structures. Furthermore, our experiment confirmed that image-based tracking can provide a comparable level of performance to that of sensor-based measurements. Overall, the results suggest that the CNN is a promising approach for VFoA estimation.

Year	DOI	Venue
2018	10.1145/3242969.3242973	ICMI
Keywords	Field	DocType
gaze, visual focus of attention, meeting analysis, multimodal fusion, deep learning, convolutional neural networks	Computer vision,BitTorrent tracker,Conversation,Gaze,Convolutional neural network,Computer science,Speech recognition,Nonverbal communication,Robustness (computer science),Artificial intelligence,Deep learning,Bayes' theorem	Conference
ISBN	Citations	PageRank
978-1-4503-5692-3	0	0.34
References	Authors
30	3

Authors (3 rows)

Cited by (0 rows)

References (30 rows)

Name	Order	Citations	PageRank
Kazuhiro Otsuka	1	619	54.15
Keisuke Kasuga	2	0	0.34
Martina Köhler	3	0	0.34

1