SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 1107 papers

Title	Date	Tasks	Status	Hype
Taming Overconfidence in LLMs: Reward Calibration in RLHF	Oct 13, 2024	Multiple-choice	CodeCode Available	1
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models	Oct 11, 2024	Few-Shot LearningMultiple-choice	CodeCode Available	1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework	Oct 2, 2024	BenchmarkingInstruction Following	CodeCode Available	1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning	Oct 1, 2024	Common Sense ReasoningDeepFake Detection	CodeCode Available	1
Boosting Healthcare LLMs Through Retrieved Context	Sep 23, 2024	BenchmarkingMultiple-choice	CodeCode Available	1
Annealed Winner-Takes-All for Motion Forecasting	Sep 17, 2024	AllAutonomous Driving	CodeCode Available	1
Training on the Benchmark Is Not All You Need	Sep 3, 2024	AllMultiple-choice	CodeCode Available	1
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering	Aug 27, 2024	Multiple-choiceProtein Folding	CodeCode Available	1
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs	Aug 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 12 of 111Next →

No leaderboard results yet.