An Integrated Framework for Two-Pass Personalized Voice Trigger. - Citegraph

Paper Info

Title
An Integrated Framework for Two-Pass Personalized Voice Trigger.

Abstract
In this paper, we present the XMUSPEECH system for Task 1 of 2020 Personalized Voice Trigger Challenge (PVTC2020). Task 1 is a joint wake-up word detection with speaker verification on close talking data. The whole system consists of a keyword spotting (KWS) sub-system and a speaker verification (SV) sub-system. For the KWS system, we applied a Temporal Depthwise Separable Convolution Residual Network (TDSC-ResNet) to improve the system's performance. For the SV system, we proposed a multi-task learning network, where phonetic branch is trained with the character label of the utterance, and speaker branch is trained with the label of the speaker. Phonetic branch is optimized with connectionist temporal classification (CTC) loss, which is treated as an auxiliary module for speaker branch. Experiments show that our system gets significant improvements compared with baseline system.

Year	DOI	Venue
2021	10.21437/Interspeech.2021-2161	Interspeech
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Dexin Liao	1	0	1.01
Jing Li	2	52	43.73
Yiming Zhi	3	0	1.35
Song Li	4	0	1.69
Q. Y. Hong	5	50	15.79
Lin Li	6	36	18.06

1