Abstract | ||
---|---|---|
We study the task of short video understanding and recommendation which predicts the user's preference based on multimodal contents, including visual features, text features, audio features and user interactive history. In this paper, we present a multi-modal representation learning method to improve the performance of recommender systems. The method first converts multi-modal contents into vectors in the embedding space, and then concatenates these vectors as the input of a multi-layer perceptron to make prediction. We also propose a novel Key-Value Memory to map dense real-values into vectors, which could obtain more sufficient semantic in a nonlinear manner. Experimental results show that our representation significantly improves several baselines and achieves the superior performance on the dataset of ICME 2019 Short Video Understanding and Recommendation Challenge. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICMEW.2019.00134 | 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) |
Keywords | Field | DocType |
Multi-modal Representation,Factorization Machine,Key-Value Memory,Word2Vec,DeepWalk | Recommender system,Embedding,Nonlinear system,Pattern recognition,Computer science,Artificial intelligence,Word2vec,Perceptron,Feature learning,Modal | Conference |
ISSN | ISBN | Citations |
2330-7927 | 978-1-5386-9215-8 | 0 |
PageRank | References | Authors |
0.34 | 7 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daya Guo | 1 | 6 | 4.81 |
Jiangshui Hong | 2 | 0 | 0.34 |
Binli Luo | 3 | 0 | 0.34 |
Qirui Yan | 4 | 0 | 0.34 |
Zhangming Niu | 5 | 0 | 0.68 |