Physical querying with multi-modal sensing - Citegraph

Paper Info

Title
Physical querying with multi-modal sensing

Abstract
We present Marvin, a system that can search physical objects using a mobile or wearable device. It integrates HOG-based object recognition, SURF-based localization information, automatic speech recognition, and user feedback information with a probabilistic model to recognize the “object of interest” at high accuracy and at interactive speeds. Once the object of interest is recognized, the information that the user is querying, e.g. reviews, options, etc., is displayed on the user's mobile or wearable device. We tested this prototype in a real-world retail store during business hours, with varied degree of background noise and clutter. We show that this multi-modal approach achieves superior recognition accuracy compared to using a vision system alone, especially in cluttered scenes where a vision system would be unable to distinguish which object is of interest to the user without additional input. It is computationally able to scale to large numbers of objects by focusing compute-intensive resources on the objects most likely to be of interest, inferred from user speech and implicit localization information. We present the system architecture, the probabilistic model that integrates the multi-modal information, and empirical results showing the benefits of multi-modal integration.

Year	DOI	Venue
2014	10.1109/WACV.2014.6836103	WACV
Keywords	Field	DocType
mobile device,speech recognition,multimodal sensing,multimodal integration,user interfaces,compute-intensive resources,multi-modal approach,marvin,user feedback,physical querying,multimodal information,cluttered scenes,system architecture,vision system,business hours,object recognition,interactive speeds,superior recognition accuracy,background noise,hog-based object recognition,wearable device,surf-based localization information,automatic speech recognition,probabilistic model,feature extraction,speech,visualization,computer architecture	Computer vision,3D single-object recognition,Speech analytics,Machine vision,Visualization,Computer science,Wearable computer,Feature extraction,Artificial intelligence,Systems architecture,Cognitive neuroscience of visual object recognition	Conference
ISSN	Citations	PageRank
2472-6737	0	0.34
References	Authors
8	9

Authors (9 rows)

Cited by (0 rows)

References (8 rows)

Name	Order	Citations	PageRank
Iljoo Baek	1	1	1.38
Taylor Stine	2	0	0.34
Denver Dash	3	0	0.34
Fanyi Xiao	4	63	4.92
Yaser Sheikh	5	2118	92.13
Yair Movshovitz-Attias	6	87	3.90
Mei Chen	7	418	36.25
Martial Hebert	8	11277	1146.89
Takeo Kanade	9	25073	4203.02

1