Abstract | ||
---|---|---|
We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBM$$_{\\text {speech}}$$ and DBM$$_\\text {face}$$ is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNN$$_{\\text {speech}}$$ and DBM-DNN$$_{\\text {face}}$$ in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance cosDist and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-29451-3_50 | PSIVT |
Field | DocType | Volume |
I vector,Boltzmann machine,Pattern recognition,Computer science,Cosine Distance,Deep belief network,Speaker recognition,Test data,Artificial intelligence,Boltzmann constant,Classifier (linguistics) | Conference | 9431 |
ISSN | Citations | PageRank |
0302-9743 | 2 | 0.36 |
References | Authors | |
19 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohammad Rafiqul Alam | 1 | 8 | 2.54 |
M. Bennamoun | 2 | 3197 | 167.23 |
Roberto Togneri | 3 | 814 | 48.33 |
Ferdous Ahmed Sohel | 4 | 623 | 31.78 |