Situated reference resolution using visual saliency and crowdsourcing-based priors for a spoken dialog system within vehicles. - Citegraph

Paper Info

Title
Situated reference resolution using visual saliency and crowdsourcing-based priors for a spoken dialog system within vehicles.

Abstract
In this paper, we address issues in situated language understanding in a moving car. More specifically, we propose a reference resolution method to identify user queries about specific target objects in their surroundings. We investigate methods of predicting which target object is likely to be queried given a visual scene and what kind of linguistic cues users naturally provide to describe a given target object in a situated environment. We propose methods to incorporate the visual saliency of the visual scene as a prior. Crowdsourced statistics of how people describe an object are also used as a prior. We have collected situated utterances from drivers using our research system, which was embedded in a real vehicle. We demonstrate that the proposed algorithms improve target identification rate by 15.1% absolute over the baseline method that does not use visual saliency-based prior and depends on public database with a limited number of category information.

Year	DOI	Venue
2018	10.1016/j.csl.2017.09.001	Computer Speech & Language
Keywords	Field	DocType
Situated dialog,In-car interaction,Visual saliency,Crowdsourcing,Multimodal interaction	Situated,Spoken dialog,Computer science,Crowdsourcing,Speech recognition,Prior probability,Language understanding,Visual saliency	Journal
Volume	Issue	ISSN
48	C	0885-2308
Citations	PageRank	References
0	0.34	18
Authors
1

Authors (1 rows)

Cited by (0 rows)

References (18 rows)

Name	Order	Citations	PageRank
Teruhisa Misu	1	19	5.89

1