Title
Spatial-Temporal Relation Networks For Multi-Object Tracking
Abstract
Recent progress in multiple object tracking (MOT) has shown that a robust similarity score is key to the success of trackers. A good similarity score is expected to reflect multiple cues, e.g. appearance, location, and topology, over a long period of time. However, these cues are heterogeneous, making them hard to be combined in a unified network. As a result, existing methods usually encode them in separate networks or require a complex training approach. In this paper, we present a unified framework for similarity measurement between a tracklet and an object, which simultaneously encode various cues across time. We show a crucial principle to achieve this unified framework is the design of compatible feature representation for different cues and different sources (tracklet and object). A key technique behind this principle is a spatial-temporal relation module, which jointly models appearance and topology, and makes tracklet and object features compatible. The resulting method, named spatial-temporal relation networks (STRN), runs in a feed-forward way and can be trained in an end-to-end manner. The state-of-the-art accuracy was achieved on all of the MOT15 similar to 17 benchmarks using public detection and online settings.
Year
DOI
Venue
2019
10.1109/ICCV.2019.00409
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019)
Field
DocType
Volume
BitTorrent tracker,ENCODE,Pattern recognition,Computer science,Video tracking,Artificial intelligence
Journal
abs/1904.11489
Issue
ISSN
Citations 
1
1550-5499
8
PageRank 
References 
Authors
0.44
0
4
Name
Order
Citations
PageRank
Jiarui Xu192.48
Yue Cao257421.49
Zheng Zhang343615.48
Han Hu41795.53