Title
Spatiotemporal visual-semantic embedding network for zero-shot action recognition
Abstract
Zero-shot learning (ZSL) has recently attracted increasing attention in visual tasks like action recognition. We propose a spatiotemporal visual-semantic embedding network (STVSEM) for zero-shot action recognition. First, given the fact that two-stream architecture based action recognition algorithms have achieved excellent results in recent years, the module is assembled to our designed network by simultaneously using the spatial features (e.g., RGB appearance) and optical flow in time domain as visual features to significantly improve the visual expression capability. Then, in order to slightly alleviate the problem of semantic loss that typically occurs in the case of using embedding-based ZSL methods, an autoencoder is introduced to get a better semantic representation and complement semantic relationship information for unseen classes by seen classes. Last but not least, a joint embedding mechanism that explores and exploits the relationships of the visual data and semantic information in an intermediate space is employed to ameliorate the gap between vision and semantics. The experimental results on Charades and UCF101 datasets indicate that the proposed method outperforms the state-of-the-art methods in accuracy, which further demonstrates the effectiveness of our method. (C) 2019 SPIE and IS&T
Year
DOI
Venue
2019
10.1117/1.JEI.28.2.023007
JOURNAL OF ELECTRONIC IMAGING
Keywords
Field
DocType
zero-shot action recognition,spatiotemporal representation,autoencoder,joint embedding space,semantic relationship
Computer vision,Embedding,Computer science,Action recognition,Artificial intelligence
Journal
Volume
Issue
ISSN
28
2
1017-9909
Citations 
PageRank 
References 
1
0.35
18
Authors
5
Name
Order
Citations
PageRank
Rongqiao An110.35
Zhenjiang Miao230.76
Qingyu Li310.35
Wanru Xu44714.23
Qiang Zhang521.38