SOTAVerified

Multiple-choice

Papers

Showing 351375 of 1107 papers

TitleStatusHype
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension0
Adapting Vision-Language Models for Evaluating World Models0
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models0
GANDALF: a General Character Name Description Dataset for Long Fiction0
Generating Diagnostic Multiple Choice Comprehension Cloze Questions0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
AI-based Arabic Language and Speech Tutor0
Answering Science Exam Questions Using Query Reformulation with Background Knowledge0
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition0
Answering Science Exam Questions Using Query Rewriting with Background Knowledge0
BloomVQA: Assessing Hierarchical Multi-modal Comprehension0
AI and Machine Learning for Next Generation Science Assessments0
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth0
Answering Questions in Stages: Prompt Chaining for Contract QA0
BLINK: Multimodal Large Language Models Can See but Not Perceive0
ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising0
Answering questions by learning to rank - Learning to rank by answering questions0
Evalita-LLM: Benchmarking Large Language Models on Italian0
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles0
Establishing Task Scaling Laws via Compute-Efficient Model Ladders0
Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset0
Evaluating Machine Reading Systems through Comprehension Tests0
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta0
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation0
Show:102550
← PrevPage 15 of 45Next →

No leaderboard results yet.