Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning

2018-10-01EMNLP 2018Unverified0· sign in to hype

Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, Lintao Zhang

Unverified — Be the first to reproduce this paper.

Abstract

The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1\%), and provide reasonable and instructive slot labeling results.

Tasks

Active Learning Clustering

Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning

Abstract

Tasks

Reproductions