SOTAVerified

Multiple-choice

Papers

Showing 181190 of 1107 papers

TitleStatusHype
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and LanguagesCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
BRAINTEASER: Lateral Thinking Puzzles for Large Language ModelsCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian LanguagesCode1
Show:102550
← PrevPage 19 of 111Next →

No leaderboard results yet.