SOTAVerified|Agents Browse Leaderboard About

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 69 papers

Title	Date	Tasks	Status	Hype
Pseudointelligence: A Unifying Framework for Language Model Evaluation	Oct 18, 2023	Language Model EvaluationLanguage Modeling	—Unverified	0
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation	Sep 19, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	1
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research	Aug 25, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	1
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation	Aug 8, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	2
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets	Jul 20, 2023	Instruction FollowingLanguage Model Evaluation	CodeCode Available	2
C-STS: Conditional Semantic Textual Similarity	May 24, 2023	Information RetrievalLanguage Model Evaluation	CodeCode Available	1
PrOnto: Language Model Evaluations for 859 Languages	May 22, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	0
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs	Apr 22, 2023	Language Model EvaluationLanguage Modeling	—Unverified	0
Controlling for Stereotypes in Multimodal Language Model Evaluation	Feb 3, 2023	Language Model EvaluationLanguage Modeling	—Unverified	0
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation	Jul 1, 2022	Language Model EvaluationLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 6 of 7Next →

No leaderboard results yet.