Abstract | ||
---|---|---|
We formulate the problem of audio-visual speaker association as a dynamic dependency test. That is, given an audio stream and multiple video streams, we wish to determine their dependency structure as it evolves over time. To this end, we propose the use of a hidden factorization Markov model in which the hidden state encodes a finite number of possible dependency structures. Each dependency structure has an explicit semantic meaning, namely "who is speaking". This model takes advantage of both structural and parametric changes associated with changes in speaker. This is contrasted with standard sliding window based dependence analysis. Using this model we obtain state-of-the-art performance on an audio-visual association task without benefit of training data. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1109/ICASSP.2007.366271 | Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference |
Keywords | Field | DocType |
Markov processes,audio signal processing,speaker recognition,video signal processing,audio stream,audio-visual speaker association,dynamic dependency tests,hidden factorization Markov model,multiple video streams,Pattern clustering methods | Markov process,Sliding window protocol,Multivalued dependency,Pattern recognition,Markov model,Computer science,Join dependency,Speech recognition,Speaker recognition,Artificial intelligence,Audio signal processing,Hidden Markov model | Conference |
Volume | ISSN | ISBN |
2 | 1520-6149 | 1-4244-0727-3 |
Citations | PageRank | References |
5 | 0.51 | 7 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michael R. Siracusa | 1 | 13 | 1.62 |
John W. Fisher III | 2 | 878 | 74.44 |
Fisher, J.W. | 3 | 5 | 0.51 |