SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 281–290 of 1107 papers

Title	Date	Tasks	Status	Hype
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Nov 25, 2024	Language ModelingLanguage Modelling	—Unverified	0
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset	Nov 23, 2024	Language ModelingLanguage Modelling	—Unverified	0
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation	Nov 20, 2024	ChatbotMultiple-choice	—Unverified	0
Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning	Nov 18, 2024	Logical ReasoningMultiple-choice	—Unverified	0
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?	Nov 17, 2024	Multiple-choice	CodeCode Available	1
A Benchmark for Long-Form Medical Question Answering	Nov 14, 2024	Answer GenerationForm	CodeCode Available	0
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine	Nov 14, 2024	FormHallucination	CodeCode Available	0
TRACE: Transformer-based Risk Assessment for Clinical Evaluation	Nov 13, 2024	Decision MakingMissing Values	CodeCode Available	0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents	Nov 12, 2024	General KnowledgeHallucination	—Unverified	0
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs	Nov 12, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	0

Show:10 25 50

← PrevPage 29 of 111Next →

No leaderboard results yet.