Abstract | ||
---|---|---|
We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and our modelling infrastructures: a higher semantic feature approach, multi-layer HMMs, a multi-stream DBN, as well as a multi-stream mixed-state DBN for disturbed data. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11677482_5 | MLMI |
Keywords | Field | DocType |
multimodal integration,shared meeting data,common lexicon,multimodal human interaction,higher semantic analysis,disturbed data,different approach,meeting group action segmentation,different multimodal feature set,multimodal group meeting action,common evaluation procedure,meeting browser,human interaction,vision,group action | Computer science,Segmentation,Speech recognition,Lexicon,Natural language processing,Artificial intelligence,Semantic feature,Machine learning | Conference |
Volume | ISSN | ISBN |
3869 | 0302-9743 | 3-540-32549-2 |
Citations | PageRank | References |
11 | 0.65 | 13 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Marc Al-Hames | 1 | 116 | 8.75 |
Alfred Dielmann | 2 | 171 | 11.64 |
Daniel Gatica-Perez | 3 | 4182 | 276.74 |
Stephan Reiter | 4 | 278 | 17.21 |
Steve Renals | 5 | 2570 | 293.02 |
Gerhard Rigoll | 6 | 2788 | 268.87 |
Dong Zhang | 7 | 646 | 38.04 |