SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 161–170 of 1107 papers

Title	Date	Tasks	Status	Hype
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic	Feb 20, 2024	ArabicMMLULanguage Model Evaluation	CodeCode Available	1
The Effect of Sampling Temperature on Problem Solving in Large Language Models	Feb 7, 2024	Multiple-choicePrompt Engineering	CodeCode Available	1
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models	Feb 6, 2024	AttributeFace Anti-Spoofing	CodeCode Available	1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models	Jan 29, 2024	EthicsMultiple-choice	CodeCode Available	1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning	Jan 25, 2024	Multiple-choicePosition	CodeCode Available	1
LongHealth: A Question Answering Benchmark with Long Clinical Documents	Jan 25, 2024	Information RetrievalMultiple-choice	CodeCode Available	1
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models	Jan 11, 2024	MathMultiple-choice	CodeCode Available	1
HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses	Dec 26, 2023	DiversityKnowledge Graphs	CodeCode Available	1
RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models	Dec 26, 2023	MemorizationMultiple-choice	CodeCode Available	1
An In-depth Look at Gemini's Language Abilities	Dec 18, 2023	Instruction FollowingMath	CodeCode Available	1

Show:10 25 50

← PrevPage 17 of 111Next →

No leaderboard results yet.