SOTAVerified

Multiple-choice

Papers

Showing 201210 of 1107 papers

TitleStatusHype
Can large language models reason about medical questions?Code1
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering ModelsCode1
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain DialogueCode1
Option Tracing: Beyond Correctness Analysis in Knowledge TracingCode1
ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access NetworksCode1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuningCode1
Large Language Models Encode Clinical KnowledgeCode1
Show:102550
← PrevPage 21 of 111Next →

No leaderboard results yet.