SOTAVerified

Multiple-choice

Papers

Showing 591600 of 1107 papers

TitleStatusHype
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model LeaderboardsCode0
An Information-Theoretic Approach to Analyze NLP Classification TasksCode0
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBenchCode4
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis0
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language ModelsCode2
Towards Collective Superintelligence: Amplifying Group IQ using Conversational Swarms0
LongHealth: A Question Answering Benchmark with Long Clinical DocumentsCode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
What Large Language Models Know and What People Think They Know0
Show:102550
← PrevPage 60 of 111Next →

No leaderboard results yet.