Speaker Clustering in Textual Dialogue with Utterance Correlation and Cross-corpus Dialogue Act Supervision

2022-01-16ACL ARR January 2022Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

We propose a textual dialogue speaker clustering model, which groups the utterances of a multi-party dialogue without speaker annotations, so that the real speakers are identical inside each cluster. We find that, even without knowing the speakers, the interactions between utterances are still implied in the text. Such interactions suggest the correlations of the speakers. In this work, we model the semantic content of an utterance with a pre-trained language model, and the correlations of speakers with an utterance-level pairwise matrix. The semantic content representation can be further enhanced by additional cross-corpus supervised dialogue act modeling. The speaker labels are finally generated by spectral clustering. Experiment shows that our model outperforms the sequence classification baseline, and benefits from the set-specific dialogue act classification auxiliary task. We also discuss the detail of correlation modeling and step-wise training process.

Tasks

Clustering Cross-corpus Dialogue Act Classification Language Modeling Language Modelling

Speaker Clustering in Textual Dialogue with Utterance Correlation and Cross-corpus Dialogue Act Supervision

Abstract

Tasks

Reproductions