SOTAVerified

Multiple-choice

Papers

Showing 161170 of 1107 papers

TitleStatusHype
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
The Effect of Sampling Temperature on Problem Solving in Large Language ModelsCode1
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language ModelsCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
LongHealth: A Question Answering Benchmark with Long Clinical DocumentsCode1
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language ModelsCode1
HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs ResponsesCode1
RoleEval: A Bilingual Role Evaluation Benchmark for Large Language ModelsCode1
An In-depth Look at Gemini's Language AbilitiesCode1
Show:102550
← PrevPage 17 of 111Next →

No leaderboard results yet.