A Hybrid Architecture for Out of Domain Intent Detection and Intent Discovery

2023-03-07Code Available1· sign in to hype

Masoud Akbari, Ali Mohades, M. Hassan Shirali-Shahreza

Code Available — Be the first to reproduce this paper.

Code

github.com/Makbari1997/VAE-KPCA-HDBSCAN
Officialtf★ 11

Abstract

Intent Detection is one of the tasks of the Natural Language Understanding (NLU) unit in task-oriented dialogue systems. Out of Scope (OOS) and Out of Domain (OOD) inputs may run these systems into a problem. On the other side, a labeled dataset is needed to train a model for Intent Detection in task-oriented dialogue systems. The creation of a labeled dataset is time-consuming and needs human resources. The purpose of this article is to address mentioned problems. The task of identifying OOD/OOS inputs is named OOD/OOS Intent Detection. Also, discovering new intents and pseudo-labeling of OOD inputs is well known by Intent Discovery. In OOD intent detection part, we make use of a Variational Autoencoder to distinguish between known and unknown intents independent of input data distribution. After that, an unsupervised clustering method is used to discover different unknown intents underlying OOD/OOS inputs. We also apply a non-linear dimensionality reduction on OOD/OOS representations to make distances between representations more meaning full for clustering. Our results show that the proposed model for both OOD/OOS Intent Detection and Intent Discovery achieves great results and passes baselines in English and Persian languages.

Tasks

Clustering Dimensionality Reduction Intent Detection Intent Discovery Natural Language Understanding Out of Distribution (OOD) Detection Task-Oriented Dialogue Systems

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ATIS	k-PCA + HDBSCAN	ARI	74.94	—	Unverified
Persian-ATIS	k-PCA + HDBSCAN	ARI	11.97	—	Unverified
SNIPS	k-PCA + HDBSCAN	ARI	59.23	—	Unverified

A Hybrid Architecture for Out of Domain Intent Detection and Intent Discovery

Code

Abstract

Tasks

Benchmark Results

Reproductions