SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 5160 of 69 papers

TitleStatusHype
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation0
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
ViDAS: Vision-based Danger Assessment and Scoring0
Lessons from the Trenches on Reproducible Evaluation of Language Models0
LMUnit: Fine-grained Evaluation with Natural Language Unit Tests0
Large Language Model Evaluation via Matrix Nuclear-NormCode0
PrOnto: Language Model Evaluations for 859 LanguagesCode0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.