Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition | 0 | 0.34 | 2022 |
Non-Parallel Voice Conversion for ASR Augmentation | 0 | 0.34 | 2022 |
Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems | 0 | 0.34 | 2022 |
On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer | 0 | 0.34 | 2022 |
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition | 2 | 0.37 | 2022 |
Improving Rare Word Recognition with LM-aware MWER Training | 0 | 0.34 | 2022 |
Ask2Mask: Guided Data Selection for Masked Speech Modeling | 0 | 0.34 | 2022 |
MAESTRO: Matched Speech Text Representations through Modality Matching | 0 | 0.34 | 2022 |
Reducing Domain mismatch in Self-supervised speech pre-training | 0 | 0.34 | 2022 |
EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION MODEL FOR ATYPICAL SPEECH | 0 | 0.34 | 2021 |
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection. | 0 | 0.34 | 2020 |
SCADA - Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR. | 0 | 0.34 | 2020 |
Multilingual Speech Recognition with Self-Attention Structured Parameterization. | 0 | 0.34 | 2020 |
Speech Recognition With Augmented Synthesized Speech | 1 | 0.36 | 2019 |
Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning | 1 | 0.35 | 2019 |
Fast Neural Network Language Model Lookups At N-Gram Speeds | 0 | 0.34 | 2017 |
Bias And Statistical Significance In Evaluating Speech Synthesis With Mean Opinion Scores | 4 | 0.39 | 2017 |
Weakly-Supervised Phrase Assignment From Text In A Speech-Synthesis System Using Noisy Labels | 0 | 0.34 | 2017 |
Introduction To The Special Issue On End-To-End Speech And Language Processing | 0 | 0.34 | 2017 |
Recent progress in deep end-to-end models for spoken language processing. | 0 | 0.34 | 2017 |
English Conversational Telephone Speech Recognition By Humans And Machines | 38 | 1.72 | 2017 |
Efficient Knowledge Distillation From An Ensemble Of Teachers | 7 | 0.63 | 2017 |
Invariant Representations for Noisy Speech Recognition. | 1 | 0.35 | 2016 |
Using Deep Bidirectional Recurrent Neural Networks For Prosodic-Target Prediction In A Unit-Selection Text-To-Speech System | 2 | 0.43 | 2015 |
Multilingual representations for low resource speech recognition and keyword search | 11 | 0.55 | 2015 |
Bidirectional Recurrent Neural Network Language Models For Automatic Speech Recognition | 8 | 0.56 | 2015 |
A Multi-Region Deep Neural Network Model In Speech Recognition | 0 | 0.34 | 2015 |
Diverse Embedding Neural Network Language Models. | 0 | 0.34 | 2014 |
Automatic keyword selection for keyword search development and tuning | 7 | 0.43 | 2014 |
Kernel methods match Deep Neural Networks on TIMIT | 31 | 1.25 | 2014 |
Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. | 20 | 0.75 | 2014 |
A high-performance Cantonese keyword search system | 5 | 0.49 | 2013 |
Improving training time of Hessian-free optimization for deep neural networks using preconditioning and sampling. | 0 | 0.34 | 2013 |
Generalized Ambiguity Decomposition for Understanding Ensemble Diversity. | 0 | 0.34 | 2013 |
System combination and score normalization for spoken term detection | 36 | 1.87 | 2013 |
Deep convolutional neural networks for LVCSR | 64 | 2.92 | 2013 |
An Evaluation Of Posterior Modeling Techniques For Phonetic Recognition | 0 | 0.34 | 2013 |
Learning filter banks within a deep neural network framework | 35 | 1.41 | 2013 |
An empirical study of confusion modeling in keyword search for low resource languages | 14 | 0.72 | 2013 |
Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks | 15 | 0.71 | 2013 |
F0 Contour Prediction With A Deep Belief Network-Gaussian Process Hybrid Model | 14 | 0.84 | 2013 |
Exemplar-Based Processing for Speech Recognition: An Overview. | 33 | 1.05 | 2012 |
Deep neural network language models | 64 | 3.38 | 2012 |
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech | 3 | 0.42 | 2012 |
Constructing ensembles of dissimilar acoustic models using hidden attributes of training data | 0 | 0.34 | 2012 |
Acoustically discriminative language model training with pseudo-hypothesis | 3 | 0.39 | 2012 |
Auto-encoder bottleneck features using deep belief networks | 67 | 6.39 | 2012 |
Trends in Speech and Language Processing [In the Spotlight]. | 4 | 0.45 | 2012 |
Clustering With Modified Cosine Distance Learned From Constraints | 0 | 0.34 | 2011 |
Exploiting Active-Learning Strategies For Annotating Prosodic Events With Limited Labeled Data | 0 | 0.34 | 2011 |