SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 110 of 69 papers

TitleStatusHype
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for CodeCode4
Evalverse: Unified and Accessible Library for Large Language Model EvaluationCode3
AgentSims: An Open-Source Sandbox for Large Language Model EvaluationCode2
C^2LEVA: Toward Comprehensive and Contamination-Free Language Model EvaluationCode2
BigBIO: A Framework for Data-Centric Biomedical Natural Language ProcessingCode2
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill SetsCode2
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model EvaluationCode1
Salmon: A Suite for Acoustic Language Model EvaluationCode1
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test ConstructionCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.