SOTAVerified

Multiple-choice

Papers

Showing 211220 of 1107 papers

TitleStatusHype
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission ExamsCode1
ParallelPARC: A Scalable Pipeline for Generating Natural-Language AnalogiesCode1
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 LanguagesCode1
Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in KoreanCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
EduQG: A Multi-format Multiple Choice Dataset for the Educational DomainCode1
Show:102550
← PrevPage 22 of 111Next →

No leaderboard results yet.