SOTAVerified

Human Judgment Classification

A task where an algorithm judges which sample is better in accordance with human judgment.

Papers

Showing 12 of 2 papers

TitleStatusHype
Mutual Information Divergence: A Unified Metric for Multimodal Generative ModelsCode1
CLIPScore: A Reference-free Evaluation Metric for Image CaptioningCode1
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MIDMean Accuracy85.2Unverified
2RefCLIP-SMean Accuracy83.1Unverified
3CLIP-SMean Accuracy80.7Unverified