SOTAVerified

Multiple-choice

Papers

Showing 611620 of 1107 papers

TitleStatusHype
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models0
Framing QA as Building and Ranking Intersentence Answer Justifications0
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models0
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
Fundamental Limitations in Defending LLM Finetuning APIs0
FusionMind -- Improving question and answering with external context fusion0
GANDALF: a General Character Name Description Dataset for Long Fiction0
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis0
Show:102550
← PrevPage 62 of 111Next →

No leaderboard results yet.