SOTAVerified

Cocktail Party Processing via Structured Prediction

2012-12-01NeurIPS 2012Unverified0· sign in to hype

Yuxuan Wang, DeLiang Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields (CRFs) to classify speech dominance within each time-frequency unit for a sound mixture. To capture complex, nonlinear relationship between input and output, both state and transition feature functions in CRFs are learned by deep neural networks. The formulation of the problem as classification allows us to directly optimize a measure that is well correlated with human speech intelligibility. The proposed system substantially outperforms existing ones in a variety of noises.

Tasks

Reproductions