Abstract | ||
---|---|---|
An audio-visual localisation and tracking system for meeting scenarios is presented which draws its inspiration from neu- robiological processing. Meetings are recorded by a KEMAR binaural manikin and a single camera placed directly above the manikin. Source localisation from the binaural audio and face, object and motion locations from the video frames are used as input to two linked neural oscillator networks. The strength of the connections between the two networks determines the map- ping between activity at a particular audio azimuth and activity at a particular visual frame column. A Hebbian learning rule is used to establish the connection strengths. The combined net- work segments the video and audio features and then produces audio-visual groupings on the basis of common spatial location. The audio-visual groupings are tracked through time using a mechanism based upon that of the human oculomotor system which incorporates smooth pursuit and saccadic movement. |
Year | Venue | Keywords |
---|---|---|
2005 | INTERSPEECH | hebbian learning,tracking system,smooth pursuit |
Field | DocType | Citations |
Smooth pursuit,Computer vision,Computer science,Tracking system,Azimuth,Speech recognition,Hebbian theory,Artificial intelligence,Saccadic masking,Binaural recording | Conference | 1 |
PageRank | References | Authors |
0.40 | 3 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Stuart N. Wrigley | 1 | 181 | 20.56 |
Guy J. Brown | 2 | 760 | 97.54 |