SOTAVerified|Agents Browse Leaderboard About

Dialogue Evaluation

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 97 papers

Title	Date	Tasks	Status	Hype
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment	Dec 18, 2022	Data AugmentationDialogue Evaluation	—Unverified	0
Pragmatically Appropriate Diversity for Dialogue Evaluation	Apr 6, 2023	Dialogue EvaluationDiversity	—Unverified	0
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers	May 1, 2020	Dialogue Evaluation	—Unverified	0
Dialogue Evaluation with Offline Reinforcement Learning	Sep 2, 2022	Dialogue EvaluationOffline RL	—Unverified	0
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue	Sep 15, 2023	Dialogue EvaluationMulti-Task Learning	—Unverified	0
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses	Feb 23, 2019	Dialogue EvaluationResponse Generation	—Unverified	0
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges	Mar 18, 2022	Dialogue Evaluation	—Unverified	0
Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations	Sep 3, 2024	Dialogue Evaluation	—Unverified	0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation	Jun 4, 2025	Dialogue Evaluationvalid	—Unverified	0
Enhancing the Open-Domain Dialogue Evaluation in Latent Space	Aug 1, 2021	Dialogue Evaluation	—Unverified	0

Show:10 25 50

← PrevPage 5 of 10Next →

All datasets USR-TopicalChat USR-PersonaChat

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MDD-Eval	Spearman Correlation	0.51	—	Unverified
2	Lin-Reg (all)	Spearman Correlation	0.49	—	Unverified
3	USR	Spearman Correlation	0.42	—	Unverified
4	USR - DR (x = c)	Spearman Correlation	0.32	—	Unverified
5	USR - MLM	Spearman Correlation	0.31	—	Unverified
6	USR - DR (x = f)	Spearman Correlation	0.14	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Lin-Reg (all)	Spearman Correlation	0.54	—	Unverified
2	USR - DR (x = c)	Spearman Correlation	0.48	—	Unverified
3	USR	Spearman Correlation	0.47	—	Unverified
4	USR - MLM	Spearman Correlation	0.08	—	Unverified
5	USR - DR (x = f)	Spearman Correlation	-0.05	—	Unverified