SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 3140 of 69 papers

TitleStatusHype
iREPO: implicit Reward Pairwise Difference based Empirical Preference Optimization0
Lessons from the Trenches on Reproducible Evaluation of Language Models0
Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and BridgingCode0
Generalization Measures for Zero-Shot Cross-Lingual Transfer0
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language ModelsCode0
Evalverse: Unified and Accessible Library for Large Language Model EvaluationCode3
Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing PlatformCode0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
Advancing Chinese biomedical text mining with community challenges0
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.