Abstract | ||
---|---|---|
An important problem in speech, and generally activity, recognition is to develop analyses that are invariant to the execution rates. We introduce a theoretical framework that provides a parametrization-invariant metric for comparing parametrized paths on Riemannian manifolds. Treating instances of activities as parametrized paths on a Riemannian manifold of covariance matrices, we apply this framework to the problem of visual speech recognition from image sequences. We represent each sequence as a path on the space of covariance matrices, each covariance matrix capturing spatial variability of visual features in a frame, and perform simultaneous pairwise temporal alignment and comparison of paths. This removes the temporal variability and helps provide a robust metric for visual speech classification. We evaluated this idea on the OuluVS database and the rank-1 nearest neighbor classification rate improves from 32% to 57% due to temporal alignment. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/NCVPRIPG.2013.6776200 | National Conference on Computer Vision Pattern Recognition Image Processing and Graphics |
Keywords | DocType | ISSN |
speech recognition | Conference | 2372-658X |
Citations | PageRank | References |
0 | 0.34 | 12 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jing-yong Su | 1 | 156 | 10.93 |
Anuj Srivastava | 2 | 2853 | 199.47 |
fillipe souza | 3 | 0 | 0.34 |
Sudeep Sarkar | 4 | 2839 | 317.68 |