SOTAVerified

Multiple-choice

Papers

Showing 401425 of 1107 papers

TitleStatusHype
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta0
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?0
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation0
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge0
Answering questions by learning to rank -- Learning to rank by answering questions0
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?0
Enhancing Multiple-Choice Question Answering with Causal Knowledge0
Exposing the Limits of Video-Text Models through Contrast Sets0
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!0
Enhancing Multiple-choice Machine Reading Comprehension by Punishing Illogical Interpretations0
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees0
Answering Chinese Elementary School Social Studies Multiple Choice Questions0
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models0
Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning0
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
Enhancing LLM Evaluations: The Garbling Trick0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models0
Field-testing items using artificial intelligence: Natural language processing with transformers0
Answering Chinese Elementary School Social Study Multiple Choice Questions0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Show:102550
← PrevPage 17 of 45Next →

No leaderboard results yet.