Human Object Interaction Detection via Multi-level Conditioned Network - Citegraph

Paper Info

Title
Human Object Interaction Detection via Multi-level Conditioned Network

Abstract
As one of the essential problems in scene understanding, human object interaction detection (HOID) aims to recognize fine-grained object-specific human actions, which demands the capabilities of both visual perception and reasoning. Existing methods based on convolutional neural network (CNN) utilize diverse visual features for HOID, which are insufficient for complex human object interaction understanding. To enhance the reasoning capablity of CNN, we propose a novel multi-level conditioned network that fuses extra spatial-semantic knowledge with visual features. Specifically, we construct a multi-branch CNN as backbone for multi-level visual representation. We then encode extra knowledge including human body structure and object context as condition to dynamically influence the feature extraction of CNN by affine transformation and attention mechanism. Finally, we fuse the modulated multimodal features to distinguish the interactions. The proposed method is evaluated on two most frequently-used benchmarks, HICO-DET and V-COCO. The experiment results show that our method is superior to the state-of-the-arts.

Year	DOI	Venue
2020	10.1145/3372278.3390671	ICMR '20: International Conference on Multimedia Retrieval Dublin Ireland June, 2020
DocType	ISBN	Citations
Conference	978-1-4503-7087-5	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xu Sun	1	0	0.34
Xinwen Hu	2	2	1.77
Tongwei Ren	3	328	30.22
Gang-Shan Wu	4	27	6.75

1