SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 110 of 69 papers

TitleStatusHype
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for CodeCode4
Evalverse: Unified and Accessible Library for Large Language Model EvaluationCode3
C^2LEVA: Toward Comprehensive and Contamination-Free Language Model EvaluationCode2
AgentSims: An Open-Source Sandbox for Large Language Model EvaluationCode2
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill SetsCode2
BigBIO: A Framework for Data-Centric Biomedical Natural Language ProcessingCode2
Role-Playing Evaluation for Large Language ModelsCode1
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment AnalysisCode1
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model EvaluationCode1
Template Matters: Understanding the Role of Instruction Templates in Multimodal Language Model Evaluation and TrainingCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.