A Repository of Conversational Datasets

2019-04-13WS 2019Code Available0· sign in to hype

Matthew Henderson, Paweł Budzianowski, Iñigo Casanueva, Sam Coope, Daniela Gerz, Girish Kumar, Nikola Mrkšić, Georgios Spithourakis, Pei-Hao Su, Ivan Vulić, Tsung-Hsien Wen

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/PolyAI-LDN/conversational-datasets
OfficialIn papertf★ 0
github.com/SarthakVaswani/ace_bot
none★ 2
github.com/ACE-VSIT/ACE-Ampethatic_bot
none★ 0

Abstract

Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.

Tasks

BIG-bench Machine Learning Conversational Response Selection Dialogue Understanding

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
PolyAI AmazonQA	PolyAI Encoder	1-of-100 Accuracy	71.3	—	Unverified
PolyAI OpenSubtitles	PolyAI Encoder	1-of-100 Accuracy	30.6	—	Unverified
PolyAI Reddit	PolyAI Encoder	1-of-100 Accuracy	61.3	—	Unverified

A Repository of Conversational Datasets

Code

Abstract

Tasks

Benchmark Results

Reproductions