Title
Learning Spatial-Temporal Representations Over Walking Tracklet for Long-Term Person Re-Identification in the Wild
Abstract
Long-term person re-identification (re-ID) aims to build identity correspondence of the Target Subject of Interest (TSI) exposed under surveillance cameras over a long time interval. Compared to the conventional short-term re-ID studied by most existing works, it suffers an additional problem: significant dressing change observed with time lapsing. Unfortunately, this variation in long-term person re-ID case contradicts the assumption of prior short-term re-ID approaches, and thus causes significant difficulties if conventional short-term re-ID methods are applied. To address the problem, this paper proposes to learn hybrid feature representation via a two-stream network named SpTSkM, including a spatial-temporal stream and a skeleton motion stream. The former performs directly on image sequences, which tends to learn identity-related spatial-temporal patterns such as body geometric structure and body movement. The latter operates on normalized 3D skeletons by adapting graph convolutional network, which tends to learn pure motion patterns from skeleton sequences. Both streams extract fine-grained level time-gap stable information that is robust to appearance changes in long-term re-ID and meanwhile maintains sufficient discriminability to differentiate different people. The final matching metric is obtained by mixing information of the two streams in a score-level fusion strategy. In addition, we collect a Cloth-Varying vIDeo re-ID (CVID-reID) dataset particularly for long-term re-ID. It contains video tracklets of celebrities posted on the Internet. These videos are snapshots under extremely different scenarios that include highly dynamic background, diverse camera views and abundant cloth variations on each TSI. These factors cause CVID-reID more complicated and closer to practice. Our experiments demonstrate the difficulty of long-term person re-ID and also validate the effectiveness of the proposed SpTSkM, showing the best performance.
Year
DOI
Venue
2021
10.1109/TMM.2020.3028461
IEEE TRANSACTIONS ON MULTIMEDIA
Keywords
DocType
Volume
Skeleton, Tracking, Three-dimensional displays, Image color analysis, Cameras, Streaming media, Trajectory, Long-term person re-identification, space-time patterns, 3D skeleton normalization, dataset collection
Journal
23
ISSN
Citations 
PageRank 
1520-9210
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Peng Zhang1191.67
Jingsong Xu28012.87
Qiang Wu353454.06
Yan Huang422627.65
Xianye Ben513110.56