SOTAVerified

Multiple-choice

Papers

Showing 7180 of 1107 papers

TitleStatusHype
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document CorporaCode0
HealthBench: Evaluating Large Language Models Towards Improved Human HealthCode7
Benchmarking AI scientists in omics data-driven biological researchCode1
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language ModelsCode0
Assessing the Chemical Intelligence of Large Language ModelsCode1
How well do LLMs reason over tabular data, really?0
Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students' (Mis)Understanding Is Hinted0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information0
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement LearningCode2
Show:102550
← PrevPage 8 of 111Next →

No leaderboard results yet.