Title
Towards a low bandwidth talking face using appearance models
Abstract
The paper is motivated by the need to develop low bandwidth virtual humans capable of delivering audio-visual speech and sign language at a quality com- parable to high bandwidth video. The number of bits required for animating a virtual human is significantly reduced by using an appearance model com- bined with parameter compression. A new perceptual method is introduced and used to evaluate the quality of the synthesised sequences. It appears that 3.6 kbits.s can still yield acceptable quality. 1 Introduction Many pre-lingually deaf people find closed caption subtitles in broadcast television of less help than might be expected. Sign language is their first acquired language and sub- sequently they have difficulties learning to read and write using the conventions of an oral language. The difficulty is similar to that experienced by hearing people when acquir- ing a second language (14). Deaf people, therefore, value the presence of an on-screen signer (13) using, in the UK, British Sign Language (BSL). This has been recognised by UK legislation. It requires terrestrial digital television to provide on-screen signing. This paper is motivated by the need to develop virtual humans capable of delivering sign lan- guage at a quality comparable to high bandwidth video. An important feature of such an avatar will be the realistic reproduction of facial gestures. They should be clear enough for lipreading for which the face, particularly the tongue is extremely important, although the mouth shapes associated with signing are not those of spoken words. For television broad- cast purposes an avatar (28) that can be driven at a bandwidth of less than 32 kbits.s is desirable. To broker the trade-off between perceived quality and bandwidth, practical methods for evaluating perceived quality are essential. A new variant of a method for evaluating perceived quality is proposed and illustrated by reporting progress towards a talking face that uses less than fi ve kbits.s .
Year
Venue
Keywords
2001
Image and Vision Computing
principal component analysis,talking faces,shape and appearance models,virtual human,sign language
Field
DocType
Citations 
Computer vision,Computer science,Speech recognition,Active appearance model,Bandwidth (signal processing),Sign language,Artificial intelligence,Virtual actor,Perception,Principal component analysis,High bandwidth
Conference
2
PageRank 
References 
Authors
0.44
21
4
Name
Order
Citations
PageRank
Barry-John Theobald133225.39
Gavin C. Cawley290259.96
Silko Kruse3162.20
J.A. Bangham448446.38