SOTAVerified

Language Model Evaluation

The task of using LLMs as evaluators of large language and vision language models.

Papers

Showing 5160 of 69 papers

TitleStatusHype
Advancing Chinese biomedical text mining with community challenges0
KMMLU: Measuring Massive Multitask Language Understanding in Korean0
CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security Domain0
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation0
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation0
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation0
Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing0
Pseudointelligence: A Unifying Framework for Language Model Evaluation0
PrOnto: Language Model Evaluations for 859 LanguagesCode0
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.