SOTAVerified|Agents Browse Leaderboard About

Dialogue Evaluation

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–97 of 97 papers

Title	Date	Tasks	Status	Hype
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems	Jun 21, 2019	Dialogue EvaluationKnowledge Distillation	CodeCode Available	0
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings	Apr 24, 2019	Dialogue Evaluationvalid	—Unverified	0
Evaluating Coherence in Dialogue Systems using Entailment	Apr 6, 2019	Dialogue EvaluationDiversity	CodeCode Available	0
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses	Feb 23, 2019	Dialogue EvaluationResponse Generation	—Unverified	0
One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning	May 8, 2018	AllDialogue Evaluation	—Unverified	0
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses	Aug 23, 2017	Dialogue Evaluation	CodeCode Available	0
Adversarial Learning for Neural Dialogue Generation	Jan 23, 2017	Dialogue EvaluationDialogue Generation	CodeCode Available	0

Show:10 25 50

← PrevPage 10 of 10Next →

All datasets USR-TopicalChat USR-PersonaChat

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MDD-Eval	Spearman Correlation	0.51	—	Unverified
2	Lin-Reg (all)	Spearman Correlation	0.49	—	Unverified
3	USR	Spearman Correlation	0.42	—	Unverified
4	USR - DR (x = c)	Spearman Correlation	0.32	—	Unverified
5	USR - MLM	Spearman Correlation	0.31	—	Unverified
6	USR - DR (x = f)	Spearman Correlation	0.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Lin-Reg (all)	Spearman Correlation	0.54	—	Unverified
2	USR - DR (x = c)	Spearman Correlation	0.48	—	Unverified
3	USR	Spearman Correlation	0.47	—	Unverified
4	USR - MLM	Spearman Correlation	0.08	—	Unverified
5	USR - DR (x = f)	Spearman Correlation	-0.05	—	Unverified