SOTAVerified

Multiple-choice

Papers

Showing 311320 of 1107 papers

TitleStatusHype
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory0
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks0
Changing Answer Order Can Decrease MMLU Accuracy0
Evaluating Question Answering Evaluation0
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation0
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding0
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models0
Evaluating multiple large language models in pediatric ophthalmology0
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy0
Show:102550
← PrevPage 32 of 111Next →

No leaderboard results yet.