Title
Grounding spatial language for video search
Abstract
The ability to find a video clip that matches a natural language description of an event would enable intuitive search of large databases of surveillance video. We present a mechanism for connecting a spatial language query to a video clip corresponding to the query. The system can retrieve video clips matching millions of potential queries that describe complex events in video such as "people walking from the hallway door, around the island, to the kitchen sink." By breaking down the query into a sequence of independent structured clauses and modeling the meaning of each component of the structure separately, we are able to improve on previous approaches to video retrieval by finding clips that match much longer and more complex queries using a rich set of spatial relations such as "down" and "past." We present a rigorous analysis of the system's performance, based on a large corpus of task-constrained language collected from fourteen subjects. Using this corpus, we show that the system effectively retrieves clips that match natural language descriptions: 58.3% were ranked in the top two of ten in a retrieval task. Furthermore, we show that spatial relations play an important role in the system's performance.
Year
DOI
Venue
2010
10.1145/1891903.1891944
ICMI-MLMI
Keywords
Field
DocType
natural language description,video clip,task-constrained language,spatial language query,surveillance video,complex event,spatial relation,video search,video retrieval,potential query,complex query,natural language
Spatial relation,Computer vision,RDF query language,Information retrieval,Ranking,Computer science,Natural language,Video tracking,Ground,Artificial intelligence,Spatial language,CLIPS
Conference
Citations 
PageRank 
References 
2
0.41
9
Authors
5
Name
Order
Citations
PageRank
Stefanie Tellex154148.69
thomas kollar258032.64
George Shaw320.41
Nicholas Roy43644288.27
Deb Roy5103392.10