SOTAVerified|Agents Browse Leaderboard About Blog

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 69 papers

Title	Date	Tasks	Status	Hype
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation	Jun 20, 2024	GSM8KLanguage Model Evaluation	CodeCode Available	0
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation	Jun 6, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation	May 24, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
iREPO: implicit Reward Pairwise Difference based Empirical Preference Optimization	May 24, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
Lessons from the Trenches on Reproducible Evaluation of Language Models	May 23, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging	May 20, 2024	Language Model EvaluationLanguage Modeling	CodeCode Available	0
Generalization Measures for Zero-Shot Cross-Lingual Transfer	Apr 24, 2024	Cross-Lingual TransferLanguage Model Evaluation	—Unverified	0
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models	Apr 17, 2024	FormLanguage Model Evaluation	CodeCode Available	0
Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform	Mar 13, 2024	Language Model EvaluationLanguage Modelling	CodeCode Available	0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension	Mar 12, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 5 of 7Next →

No leaderboard results yet.