Title
Cocktail Party Processing via Structured Prediction.
Abstract
While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields (CRFs) to classify speech dominance within each time-frequency unit for a sound mixture. To capture complex, nonlinear relationship between input and output, both state and transition feature functions in CRFs are learned by deep neural networks. The formulation of the problem as classification allows us to directly optimize a measure that is well correlated with human speech intelligibility. The proposed system substantially outperforms existing ones in a variety of noises.
Year
Venue
Field
2012
NIPS
Conditional random field,Nonlinear system,Conversation,Cocktail party effect,Computer science,Structured prediction,Speech recognition,Input/output,Artificial intelligence,CRFS,Machine learning,Intelligibility (communication)
DocType
Citations 
PageRank 
Conference
16
1.08
References 
Authors
11
2
Name
Order
Citations
PageRank
Yu-Xuan Wang165032.68
Wang, Deliang2161.08