SOTAVerified

Multiple-choice

Papers

Showing 231240 of 1107 papers

TitleStatusHype
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuningCode1
Leveraging Large Language Models for Multiple Choice Question AnsweringCode1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning EvaluationCode1
CUPCase: Clinically Uncommon Patient Cases and Diagnoses DatasetCode1
Training on the Benchmark Is Not All You NeedCode1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Show:102550
← PrevPage 24 of 111Next →

No leaderboard results yet.