SOTAVerified

Multiple-choice

Papers

Showing 501510 of 1107 papers

TitleStatusHype
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMsCode0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability0
Quantitative Assessment of Intersectional Empathetic Bias and UnderstandingCode0
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees0
Enhancing LLM Evaluations: The Garbling Trick0
Benchmarking Bias in Large Language Models during Role-Playing0
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest0
Show:102550
← PrevPage 51 of 111Next →

No leaderboard results yet.