SOTAVerified|Agents Browse Leaderboard About

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 69 papers

Title	Date	Tasks	Status	Hype	Score
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain	Feb 11, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0	0
DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation	May 24, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0	0
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs	Apr 22, 2023	Language Model EvaluationLanguage Modeling	—Unverified	0	0
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation	Nov 29, 2023	Language Model EvaluationLanguage Modeling	—Unverified	0	0
Enterprise Large Language Model Evaluation Benchmark	Jun 25, 2025	Language Model EvaluationLanguage Modeling	—Unverified	0	0
Finance Language Model Evaluation (FLaME)	Jun 18, 2025	BenchmarkingLanguage Model Evaluation	—Unverified	0	0
Generalization Measures for Zero-Shot Cross-Lingual Transfer	Apr 24, 2024	Cross-Lingual TransferLanguage Model Evaluation	—Unverified	0	0
Improving Explainable Recommendations with Synthetic Reviews	Jul 18, 2018	Language Model EvaluationLanguage Modeling	—Unverified	0	0
iREPO: implicit Reward Pairwise Difference based Empirical Preference Optimization	May 24, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0	0
Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing	Oct 19, 2023	DecoderLanguage Model Evaluation	—Unverified	0	0

Show:10 25 50

← PrevPage 6 of 7Next →

No leaderboard results yet.