Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 1107 papers

Title	Date	Tasks	Status	Hype
How well do LLMs reason over tabular data, really?	May 12, 2025	Missing ValuesMultiple-choice	—Unverified	0
Assessing the Chemical Intelligence of Large Language Models	May 12, 2025	Multiple-choice	CodeCode Available	1
Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students' (Mis)Understanding Is Hinted	May 9, 2025	Language ModelingLanguage Modelling	—Unverified	0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	May 9, 2025	BenchmarkingForm	—Unverified	0
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	May 7, 2025	Multiple-choiceQuestion Answering	CodeCode Available	2
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	May 6, 2025	BenchmarkingMultiple-choice	CodeCode Available	0
ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant	May 6, 2025	DescriptiveMultiple-choice	CodeCode Available	0
Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text	May 5, 2025	Multiple-choice	—Unverified	0
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?	May 5, 2025	Multiple-choice	—Unverified	0
LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load	May 4, 2025	ArticlesMultiple-choice	—Unverified	0
LookAlike: Consistent Distractor Generation in Math MCQs	May 3, 2025	Distractor GenerationMath	—Unverified	0
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective Distractors	May 2, 2025	High School PhysicsMisconceptions	CodeCode Available	0
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory	May 2, 2025	Multiple-choice	—Unverified	0
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning	Apr 22, 2025	Multiple-choicereinforcement-learning	—Unverified	0
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception	Apr 21, 2025	MathMMLU	—Unverified	0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available	0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models	Apr 20, 2025	DescriptiveEthics	—Unverified	0
Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment	Apr 19, 2025	ClassificationMultiple-choice	—Unverified	0
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain	Apr 18, 2025	Multiple-choice	—Unverified	0
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model	Apr 18, 2025	Distractor GenerationMultiple-choice	—Unverified	0
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items	Apr 15, 2025	BenchmarkingMultiple-choice	—Unverified	0
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark	Apr 14, 2025	ManagementMultiple-choice	—Unverified	0
Large Language Models Could Be Rote Learners	Apr 11, 2025	MemorizationMMLU	—Unverified	0
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation	Apr 9, 2025	Multiple-choice	CodeCode Available	0
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering	Apr 7, 2025	Chart Question AnsweringChart Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 4 of 45Next →

No leaderboard results yet.