SOTAVerified

Multiple-choice

Papers

Showing 621630 of 1107 papers

TitleStatusHype
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts0
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language ModelsCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
Every Answer Matters: Evaluating Commonsense with Probabilistic MeasuresCode0
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringCode0
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?0
Automating Turkish Educational Quiz Generation Using Large Language ModelsCode0
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical DataCode0
Order-Independence Without Fine TuningCode0
Show:102550
← PrevPage 63 of 111Next →

No leaderboard results yet.