SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 1120 of 69 papers

TitleStatusHype
DART-Eval: A Comprehensive DNA Language Model Evaluation Benchmark on Regulatory DNACode1
Salmon: A Suite for Acoustic Language Model EvaluationCode1
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model EvaluationCode1
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test ConstructionCode1
Catwalk: A Unified Language Model Evaluation Framework for Many DatasetsCode1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific ResearchCode1
C-STS: Conditional Semantic Textual SimilarityCode1
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract MeaningCode1
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.