SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 6169 of 69 papers

TitleStatusHype
Controlling for Stereotypes in Multimodal Language Model Evaluation0
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation0
BPoMP: The Benchmark of Poetic Minimal Pairs – Limericks, Rhyme, and Narrative Coherence0
Language Model Evaluation in Open-ended Text Generation0
Language Model Evaluation Beyond Perplexity0
Mind the Gap: Assessing Temporal Generalization in Neural Language Models0
CLiMP: A Benchmark for Chinese Language Model Evaluation0
Improving Explainable Recommendations with Synthetic Reviews0
Contrastive Entropy: A new evaluation metric for unnormalized language models0
Show:102550
← PrevPage 7 of 7Next →

No leaderboard results yet.