SOTAVerified

Multiple-choice

Papers

Showing 301310 of 1107 papers

TitleStatusHype
Enhancing LLM Evaluations: The Garbling Trick0
Benchmarking Bias in Large Language Models during Role-Playing0
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest0
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsCode3
GPT-4o System Card0
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare0
Large Language Models Still Exhibit Bias in Long Text0
GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks0
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?Code0
Show:102550
← PrevPage 31 of 111Next →

No leaderboard results yet.