A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC

2018-09-27NAACL 2019Code Available0· sign in to hype

Mark Yatskar

Code Available — Be the first to reproduce this paper.

Code

github.com/my89/co-squac
OfficialIn papernone★ 0

Abstract

We compare three new datasets for question answering: SQuAD 2.0, QuAC, and CoQA, along several of their new features: (1) unanswerable questions, (2) multi-turn interactions, and (3) abstractive answers. We show that the datasets provide complementary coverage of the first two aspects, but weak coverage of the third. Because of the datasets' structural similarity, a single extractive model can be easily adapted to any of the datasets and we show improved baseline results on both SQuAD 2.0 and CoQA. Despite the similarity, models trained on one dataset are ineffective on another dataset, but we find moderate performance improvement through pretraining. To encourage cross-evaluation, we release code for conversion between datasets at https://github.com/my89/co-squac .

Tasks

Question Answering

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CoQA	BiDAF++ (single model)	In-domain	69.4	—	Unverified

A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC

Code

Abstract

Tasks

Benchmark Results

Reproductions