SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–510 of 1107 papers

Title	Date	Tasks	Status	Hype
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs	Nov 12, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents	Nov 12, 2024	General KnowledgeHallucination	—Unverified	0
Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability	Nov 10, 2024	Multiple-choiceText Generation	—Unverified	0
Quantitative Assessment of Intersectional Empathetic Bias and Understanding	Nov 8, 2024	Multiple-choice	CodeCode Available	0
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators	Nov 8, 2024	Decision MakingMultiple-choice	—Unverified	0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified	0
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees	Nov 4, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Enhancing LLM Evaluations: The Garbling Trick	Nov 3, 2024	Multiple-choice	—Unverified	0
Benchmarking Bias in Large Language Models during Role-Playing	Nov 1, 2024	BenchmarkingFairness	—Unverified	0
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest	Oct 27, 2024	Medical Visual Question AnsweringMultiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 51 of 111Next →

No leaderboard results yet.