SOTAVerified

Multiple-choice

Papers

Showing 981990 of 1107 papers

TitleStatusHype
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge0
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?0
Exposing the Limits of Video-Text Models through Contrast Sets0
Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History0
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees0
Towards Multistage Design of Modular Systems0
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models0
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
Show:102550
← PrevPage 99 of 111Next →

No leaderboard results yet.