SOTAVerified

Multiple-choice

Papers

Showing 111120 of 1107 papers

TitleStatusHype
Taming Overconfidence in LLMs: Reward Calibration in RLHFCode1
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language ModelsCode1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE FrameworkCode1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense ReasoningCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
Annealed Winner-Takes-All for Motion ForecastingCode1
Training on the Benchmark Is Not All You NeedCode1
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein EngineeringCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsCode1
Show:102550
← PrevPage 12 of 111Next →

No leaderboard results yet.