Abstract | ||
---|---|---|
This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReI-D). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance, it achieves Rank-1 Accuracy of 87.02% on MARS dataset without reranking, better than current state-of-the art. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICCV.2019.00406 | 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) |
DocType | Volume | Issue |
Conference | 2019 | 1 |
ISSN | Citations | PageRank |
1550-5499 | 7 | 0.40 |
References | Authors | |
15 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianing Li | 1 | 21 | 5.35 |
Shiliang Zhang | 2 | 1213 | 66.09 |
Jingdong Wang | 3 | 4198 | 156.76 |
Wen Gao | 4 | 11374 | 741.77 |
Qi Tian | 5 | 6443 | 331.75 |