W2VV++: Fully Deep Learning for Ad-hoc Video Search - Citegraph

Paper Info

Title
W2VV++: Fully Deep Learning for Ad-hoc Video Search

Abstract
Ad-hoc video search (AVS) is an important yet challenging problem in multimedia retrieval. Different from previous concept-based methods, we propose a fully deep learning method for query representation learning. The proposed method requires no explicit concept modeling, matching and selection. The backbone of our method is the proposed W2VV++ model, a super version of Word2VisualVec (W2VV) previously developed for visual-to-text matching. W2VV++ is obtained by tweaking W2VV with a better sentence encoding strategy and an improved triplet ranking loss. With these simple yet important changes, W2VV++ brings in a substantial improvement. As our participation in the TRECVID 2018 AVS task and retrospective experiments on the TRECVID 2016 and 2017 data show, our best single model, with an overall inferred average precision (infAP) of 0.157, outperforms the state-of-the-art. The performance can be further boosted by model ensemble using late average fusion, reaching a higher infAP of 0.163. With W2VV++, we establish a new baseline for ad-hoc video search.

Year	DOI	Venue
2019	10.1145/3343031.3350906	Proceedings of the 27th ACM International Conference on Multimedia
Keywords	Field	DocType
ad-hoc video search, cross-modal matching, deep learning, query representation learning, trecvid benchmarks	Computer science,Artificial intelligence,Deep learning,Multimedia	Conference
ISBN	Citations	PageRank
978-1-4503-6889-6	8	0.52
References	Authors
0	5

Authors (5 rows)

Cited by (8 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xirong Li	1	1191	68.62
Chaoxi Xu	2	29	4.63
Gang Yang	3	53	15.64
Zhineng Chen	4	192	25.29
Tiberio Uricchio	5	151	15.93

1