SOTAVerified

Multiple-choice

Papers

Showing 461470 of 1107 papers

TitleStatusHype
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein UnderstandingCode1
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language ModelsCode0
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringCode0
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?0
Every Answer Matters: Evaluating Commonsense with Probabilistic MeasuresCode0
Automating Turkish Educational Quiz Generation Using Large Language ModelsCode0
Order-Independence Without Fine TuningCode0
TopViewRS: Vision-Language Models as Top-View Spatial ReasonersCode1
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical DataCode0
Show:102550
← PrevPage 47 of 111Next →

No leaderboard results yet.