Title
SPOKEN LANGUAGE IDENTIFICATION IN UNSEEN TARGET DOMAIN USING WITHIN-SAMPLE SIMILARITY LOSS
Abstract
State-of-the-art spoken language identification (LID) networks are vulnerable to channel-mismatch that occurs due to the differences in the channels used to obtain the training and testing samples. The effect of channel-mismatch is severe when the training dataset contains very limited channel diversity. One way to address channel-mismatch is by learning a channel-invariant representation of the speech using adversarial multi-task learning (AMTL). But, AMTL approach cannot be used when the training samples do not contain the corresponding channel labels. To address this, we propose an auxiliary within-sample similarity loss (WSSL) which encourages the network to suppress the channel-specific contents in the speech. This does not require any channel labels. Specifically, WSSL gives the similarity between a pair of embeddings of same sample obtained by two separate embedding extractors. These embedding extractors are designed to capture similar information about the channel, but dissimilar LID-specific information in the speech. Furthermore, the proposed WSSL improves the noise-robustness of the LID-network by suppressing the background noise in the speech to some extent. We demonstrate the effectiveness of the proposed approach in both seen and unseen channel conditions using a set of datasets having significant channel-mismatch.
Year
DOI
Venue
2021
10.1109/ICASSP39728.2021.9414090
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords
DocType
Citations 
channel-mismatch, spoken language identification, within-sample similarity loss, domain-mismatch
Conference
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
H. Muralikrishna111.04
Shantanu Kapoor200.34
Aroor Dinesh Dileep3124.74
Padmanabhan Rajan4227.63