Abstract | ||
---|---|---|
Existing cross-media retrieval methods are mainly based on the condition where the training set covers all the categories in the testing set, which lack extensibility to retrieve data of new categories. Thus, zero-shot cross-media retrieval has been a promising direction in practical application, aiming to retrieve data of new categories (unseen categories), only with data of limited known categories (seen categories) for training. It is challenging for not only the heterogeneous distributions across different media types, but also the inconsistent semantics across seen and unseen categories need to be handled. To address the above issues, we propose
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">dual adversarial distribution network (DADN)</italic>
, to learn common embeddings and explore the knowledge from word-embeddings of different categories. The main contributions are as follows. First,
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">zero-shot cross-media dual generative adversarial networks architecture</italic>
is proposed, in which two kinds of generative adversarial networks (GANs) for common embedding generation and representation reconstruction form dual processes. The dual GANs mutually promote to model semantic and underlying structure information, which generalizes across different categories on heterogeneous distributions and boosts correlation learning. Second,
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">distribution matching with maximum mean discrepancy criterion</italic>
is proposed to combine with dual GANs, which enhances distribution matching between common embeddings and category word-embeddings. Finally,
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">adversarial inter-media metric constraint</italic>
is proposed with an inter-media loss and a quadruplet loss, which further model the inter-media correlation information and improve semantic ranking ability. The experiments on four widely used cross-media datasets demonstrate the effectiveness of our DADN approach. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TCSVT.2019.2900171 | IEEE Transactions on Circuits and Systems for Video Technology |
Keywords | DocType | Volume |
Gallium nitride,Semantics,Media,Correlation,Training,Dogs,Measurement | Journal | 30 |
Issue | ISSN | Citations |
4 | 1051-8215 | 8 |
PageRank | References | Authors |
0.50 | 9 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jingze Chi | 1 | 8 | 0.84 |
Yuxin Peng | 2 | 1122 | 74.90 |