Title
Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories
Abstract
A robot operating in a real-world environment needs to perform reasoning with a variety of sensing modalities. However, manually designing features that allow a learning algorithm to relate these different modalities can be extremely challenging. In this work, we consider the task of manipulating novel objects and appliances. To this end, we learn to embed point-cloud, natural language, and manipulation trajectory data into a shared embedding space using a deep neural network. In order to learn semantically meaningful spaces throughout our network, we introduce a method for pre-training its lower layers for multimodal feature embedding and a method for fine-tuning this embedding space using a loss-based margin. We test our model on the Robobarista dataset [22], where we achieve significant improvements in both accuracy and inference time over the previous state of the art.
Year
DOI
Venue
2015
10.1109/ICRA.2017.7989325
international conference on robotics and automation
Field
DocType
Volume
Modalities,Computer vision,Embedding,Computer science,Inference,Natural language,Artificial intelligence,Robot,Point cloud,Artificial neural network,Machine learning,Trajectory
Journal
abs/1509.07831
Issue
Citations 
PageRank 
1
2
0.39
References 
Authors
25
3
Name
Order
Citations
PageRank
Jaeyong Sung139514.51
Ian Lenz232312.07
Ashutosh Saxena34575227.88