Name
Affiliation
Papers
NAOYUKI KANDA
Hitachi Ltd, Cent Res Lab, 1-280 Higashi Koigakubo, Kokubunji, Tokyo 1858601, Japan
46
Collaborators
Citations 
PageRank 
104
103
19.45
Referers 
Referees 
References 
321
678
238
Search Limit
100678
Title
Citations
PageRank
Year
VarArray: Array-Geometry-Agnostic Continuous Speech Separation.00.342022
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition00.342022
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR00.342022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing70.422022
A review of speaker diarization: Recent advances with deep learning20.412022
All-Neural Beamformer for Continuous Speech Separation.00.342022
Streaming End-To-End Multi-Talker Speech Recognition00.342021
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.10.352021
Integration Of Speech Separation, Diarization, And Recognition For Multi-Speaker Meetings: System Description, Comparison, And Analysis00.342021
Investigation Of End-To-End Speaker-Attributed Asr For Continuous Multi-Talker Recordings00.342021
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.00.342021
MINIMUM BAYES RISK TRAINING FOR END-TO-END SPEAKER-ATTRIBUTED ASR00.342021
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.10.352021
End-to-End Speaker-Attributed ASR with Transformer.00.342021
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio00.342021
Exploring End-To-End Multi-Channel Asr With Bias Information For Meeting Transcription00.342021
HYPOTHESIS STITCHER FOR END-TO-END SPEAKER-ATTRIBUTED ASR ON LONG-FORM MULTI-TALKER RECORDINGS00.342021
INTERNAL LANGUAGE MODEL TRAINING FOR DOMAIN-ADAPTIVE END-TO-END SPEECH RECOGNITION00.342021
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.10.352021
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition20.372021
SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING10.352021
Investigation of Practical Aspects of Single Channel Speech Separation for ASR.10.352021
MICROSOFT SPEAKER DIARIZATION SYSTEM FOR THE VOXCELEB SPEAKER RECOGNITION CHALLENGE 202000.342021
Serialized Output Training for End-to-End Overlapped Speech Recognition00.342020
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers00.342020
Acoustic Modeling For Distant Multi-Talker Speech Recognition With Single- And Multi-Channel Branches00.342019
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition.00.342019
Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation00.342019
End-to-End Neural Speaker Diarization with Permutation-Free Objectives70.552019
End-to-End Neural Speaker Diarization with Self-Attention30.512019
Guided Source Separation Meets a Strong ASR Backend - Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.40.482019
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models00.342019
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR.00.342019
Face-Voice Matching using Cross-modal Embeddings.00.342018
Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models.10.352017
Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence00.342017
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription.10.392016
Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling10.352015
The NCT ASR system for IWSLT 2014.00.342014
Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier00.342013
Elastic Spectral Distortion For Low Resource Speech Recognition With Deep Neural Networks120.862013
Voice activity detection based on augmented statistical noise suppression.00.342012
A multi-expert model for dialogue and behavior control of conversational robots and agents110.742011
Open-vocabulary keyword detection from super-large scale speech database121.072008
Multi-domain spoken dialogue system with extensibility and robustness against speech recognition errors251.322006
Contextual constraints based on dialogue models in database search task for spoken dialogue systems100.792005