Title | ||
---|---|---|
No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques. |
Abstract | ||
---|---|---|
We show that with an appropriate factorization, and encodings of layout and appearance constructed from outputs of pretrained object detectors, a relatively simple model outperforms more sophisticated approaches on human-object interaction detection. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (i) eliminating train-inference mismatch; (ii) rejecting easy negatives during mini-batch training; and (iii) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches while constructing training mini-batches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Computer Vision and Pattern Recognition | Journal |
Volume | Citations | PageRank |
abs/1811.05967 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tanmay Gupta | 1 | 2 | 2.41 |
Alexander G. Schwing | 2 | 696 | 51.78 |
Derek Hoiem | 3 | 4998 | 302.66 |