SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 1107 papers

Title	Date	Tasks	Status	Hype
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S	Oct 21, 2024	Multiple-choice	—Unverified	0
TimeSeriesExam: A time series understanding exam	Oct 18, 2024	Anomaly DetectionMultiple-choice	CodeCode Available	1
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models	Oct 18, 2024	FairnessMultiple-choice	—Unverified	0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs	Oct 18, 2024	BenchmarkingFairness	—Unverified	0
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy	Oct 17, 2024	Multiple-choiceResponse Generation	—Unverified	0
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights	Oct 17, 2024	Legal ReasoningMultiple-choice	—Unverified	0
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback	Oct 17, 2024	Fact VerificationHallucination	CodeCode Available	0
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks	Oct 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	0
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation	Oct 16, 2024	BenchmarkingFairness	CodeCode Available	1
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers	Oct 15, 2024	Multiple-choice	CodeCode Available	0

Show:10 25 50

← PrevPage 32 of 111Next →

No leaderboard results yet.