SOTAVerified

Multiple-choice

Papers

Showing 371380 of 1107 papers

TitleStatusHype
Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset0
Evaluating Machine Reading Systems through Comprehension Tests0
First Token Probability Guided RAG for Telecom Question Answering0
Evalita-LLM: Benchmarking Large Language Models on Italian0
Evaluating Nuanced Bias in Large Language Model Free Response Answers0
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles0
Evaluating Question Answering Evaluation0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
Establishing Task Scaling Laws via Compute-Efficient Model Ladders0
Show:102550
← PrevPage 38 of 111Next →

No leaderboard results yet.