Abstract | ||
---|---|---|
Lip-reading has been successfully demonstrated that it can improve the performance of automatic speech recognition system especially in the presence of acoustic noise. However, the information about lip movement is still insufficient as the lip features are obtained from discrete three-dimensional points and planar images. The internal mechanisms of lip movement are not described and reflected. In this paper, we employed a novel deepening technique, namely densely connected convolutional networks (DenseNets), to obtain visual representation from color images. In addition, a new 3D lip physiologic feature based on the position and structure of facial muscles was extracted to represent the similarity of the way people speak. The color image feature and 3D lip geometric-physiologic feature were coupled together in the last fully-connected layer of DenseNets. The experimental results show that DenseNets can handle spatial temporal information of a whole image sequence and the lip feature integrating our proposed 3D geometric-physiological feature is sufficient to improve the recognition rate by as much as 3.91% (from 94.84%, with the color images only, to 98.75%). |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ICTAI.2018.00155 | 2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI) |
Keywords | Field | DocType |
lip reading, 3D feature extraction, densely connected convolutional networks, facial muscle | Noise,Pattern recognition,Computer science,Visualization,Feature extraction,Lip feature,Facial muscles,Artificial intelligence,Feature based,Image sequence,Color image | Conference |
ISSN | Citations | PageRank |
1082-3409 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianguo Wei | 1 | 1 | 2.38 |
Fan Yang | 2 | 12 | 1.99 |
Ju Zhang | 3 | 6 | 7.56 |
Ruiguo Yu | 4 | 9 | 12.96 |
Mei Yu | 5 | 6 | 2.85 |
Jianrong Wang | 6 | 17 | 5.69 |