Learning visual and textual representations for multimodal matching and classification. - Citegraph

Paper Info

Title
Learning visual and textual representations for multimodal matching and classification.

Abstract
•A unified network for image-text matching and classification.•Seamlessly incorporating the matching and classification components.•A multi-stage training algorithm by combining the matching and classification loss.•Comprehensive study on the effectiveness of the proposed approach.•Comparisons on four well-known multimodal benchmarks.

Year	DOI	Venue
2018	10.1016/j.patcog.2018.07.001	Pattern Recognition
Keywords	Field	DocType
Vision and language,Multimodal matching,Multimodal classification,Deep learning	Embedding,Pattern recognition,Artificial intelligence,Unified Model,Multimodal learning,Discriminative model,Machine learning,Mathematics	Journal
Volume	Issue	ISSN
84	1	0031-3203
Citations	PageRank	References
4	0.42	46
Authors
4

Authors (4 rows)

Cited by (4 rows)

References (46 rows)

Name	Order	Citations	PageRank
Yu Liu	1	198	25.45
Li Liu	2	733	50.04
Yanming Guo	3	128	13.06
Michael S. Lew	4	2742	166.02

1