When Hearing the Voice, Who Will Come to Your Mind - Citegraph

Paper Info

Title
When Hearing the Voice, Who Will Come to Your Mind

Abstract
Speech is a carrier containing rich biological information, such as speaker identity information including age, gender, race. In this paper, we explore the use of a self-supervised method to obtain speaker identity information from high-dimensional speech representations to generate face image. At the same time, considering that the biological information contained in the same piece of speech has different expression forms (such as images), we designed a cross-modal knowledge distillation method to transform the feature information from the visual domain to the speech domain. The feature vectors obtained through self-supervised learning and knowledge distillation are fed into a GAN-based generative model to obtain facial images containing speaker information. Subjective experiments show that our model can reach a well performance in the task of speaker identification. Experiments show that our proposed method can effectively establish the connection between different modalities and generate a face with rich biological information.

Year	DOI	Venue
2021	10.1109/IJCNN52387.2021.9534208	2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
Keywords	DocType	ISSN
speech representation, self-supervised learning, cross-modal distillation, visual reconstruction, facial synthesis	Conference	2161-4393
Citations	PageRank	References
0	0.34	0
Authors
8

Authors (8 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zhenhou Hong	1	0	0.34
Jianzong Wang	2	61	34.65
Wenqi Wei	3	48	10.69
Jie Liu	4	0	0.68
Xiaoyang Qu	5	0	1.35
Bo Chen	6	0	0.68
Zihang Wei	7	0	0.68
Jing Xiao	8	7	5.78

1