SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 3140 of 69 papers

TitleStatusHype
Large Language Model Evaluation via Matrix Nuclear-NormCode0
Pseudointelligence: A Unifying Framework for Language Model Evaluation0
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation0
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation0
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation0
ViDAS: Vision-based Danger Assessment and Scoring0
KMMLU: Measuring Massive Multitask Language Understanding in Korean0
Advancing Chinese biomedical text mining with community challenges0
Show:102550
← PrevPage 4 of 7Next →

No leaderboard results yet.