SOTAVerified

Multiple-choice

Papers

Showing 511520 of 1107 papers

TitleStatusHype
GPT-4o System Card0
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare0
Large Language Models Still Exhibit Bias in Long Text0
GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks0
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?Code0
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S0
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs0
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy0
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights0
Show:102550
← PrevPage 52 of 111Next →

No leaderboard results yet.