Abstract | ||
---|---|---|
There is a growing interest in models that can learn from unlabelled speech paired with visual context. This setting is relevant for low-resource speech processing, robotics, and human language acquisition research. Here, we study how a visually grounded speech model, trained on images of scenes paired with spoken captions, captures aspects of semantics. We use an external image tagger to generate... |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/TASLP.2018.2872106 | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
Keywords | DocType | Volume |
Semantics,Visualization,Task analysis,Predictive models,Analytical models,Speech processing,Data models | Journal | 27 |
Issue | ISSN | Citations |
1 | 2329-9290 | 4 |
PageRank | References | Authors |
0.40 | 71 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Herman Kamper | 1 | 150 | 20.70 |
Gregory Shakhnarovich | 2 | 1579 | 106.33 |
Karen Livescu | 3 | 1254 | 71.43 |