SOTAVerified|Agents Browse Leaderboard About Blog

Human Judgment Correlation

A task where an algorithm should generate the judgment scores correlating with human judgments.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–5 of 5 papers

Title	Date	Tasks	Status	Hype
PerSEval: Assessing Personalization in Text Summarizers	Jun 29, 2024	BenchmarkingHuman Judgment Correlation	—Unverified	0
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing	May 27, 2023	Graph SimilarityHuman Judgment Correlation	CodeCode Available	1
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models	May 25, 2022	Hallucination Pair-wise Detection (1-ref)Hallucination Pair-wise Detection (4-ref)	CodeCode Available	1
CLIPScore: A Reference-free Evaluation Metric for Image Captioning	Apr 18, 2021	Hallucination Pair-wise Detection (1-ref)Hallucination Pair-wise Detection (4-ref)	CodeCode Available	1
Improving Image Captioning Evaluation by Considering Inter References Variance	Jul 1, 2020	Human Judgment CorrelationImage Captioning	—Unverified	0

Show:10 25 50

All datasets Flickr8k-Expert Flickr8k-CF

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MID	Kendall's Tau-c	54.9	—	Unverified
2	SoftSPICE	Kendall's Tau-c	54.2	—	Unverified
3	RefCLIP-S	Kendall's Tau-c	53	—	Unverified
4	CLIP-S	Kendall's Tau-c	51.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MID	Kendall's Tau-b	37.3	—	Unverified
2	RefCLIP-S	Kendall's Tau-b	36.4	—	Unverified
3	CLIP-S	Kendall's Tau-b	34.4	—	Unverified