Abstract | ||
---|---|---|
For intelligent conversational agents to speak about albums of images with users as humans do, they must be able to make sense of images as humans do. Computer vision methods can report directly observable information, but human beings care about more than the directly observable; they value holistic narratives that include affective and motivational evaluations, casual connections, and other inferred relationships from external knowledge. Drawing from theories in cognitive sensemaking and narrative coherence, we propose an approach for image sequence understanding that strives to generate and evaluate hypotheses about the relationships between people, events, and objects in images using commonsense knowledge, which are formed into a consistent network of hypotheses and observed facts via multi-objective optimization. The result is an enriched knowledge representation in the form of a knowledge graph which may later be used by a conversational agent. |
Year | DOI | Venue |
---|---|---|
2021 | 10.5555/3463952.3464124 | AAMAS |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zev Battad | 1 | 0 | 0.34 |
Mei Si | 2 | 259 | 24.87 |