SOTAVerified

Multiple-choice

Papers

Showing 251260 of 1107 papers

TitleStatusHype
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language ModelsCode1
WIQA: A dataset for "What if..." reasoning over procedural textCode1
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluationCode1
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual ContextsCode1
LongHealth: A Question Answering Benchmark with Long Clinical DocumentsCode1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE FrameworkCode1
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringCode0
LiveQA: A Question Answering Dataset over Sports LiveCode0
Show:102550
← PrevPage 26 of 111Next →

No leaderboard results yet.