SOTAVerified

Human Judgment Correlation

A task where an algorithm should generate the judgment scores correlating with human judgments.

Papers

Showing 15 of 5 papers

TitleStatusHype
PerSEval: Assessing Personalization in Text Summarizers0
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph ParsingCode1
Mutual Information Divergence: A Unified Metric for Multimodal Generative ModelsCode1
CLIPScore: A Reference-free Evaluation Metric for Image CaptioningCode1
Improving Image Captioning Evaluation by Considering Inter References Variance0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MIDKendall's Tau-c54.9Unverified
2SoftSPICEKendall's Tau-c54.2Unverified
3RefCLIP-SKendall's Tau-c53Unverified
4CLIP-SKendall's Tau-c51.2Unverified
#ModelMetricClaimedVerifiedStatus
1MIDKendall's Tau-b37.3Unverified
2RefCLIP-SKendall's Tau-b36.4Unverified
3CLIP-SKendall's Tau-b34.4Unverified