SOTAVerified

Multiple-choice

Papers

Showing 6170 of 1107 papers

TitleStatusHype
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning EvaluationCode1
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models0
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training0
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image EditingCode1
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
Ranked Voting based Self-Consistency of Large Language ModelsCode1
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation0
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think0
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning0
Show:102550
← PrevPage 7 of 111Next →

No leaderboard results yet.