SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 281–290 of 1107 papers

Title	Date	Tasks	Status	Hype	Score
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs	Jun 7, 2024	Mathematical ReasoningMultiple-choice	CodeCode Available	0	5
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models	May 30, 2025	MathMultiple-choice	CodeCode Available	0	5
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models	Feb 9, 2025	Answer GenerationLanguage Modeling	CodeCode Available	0	5
Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings	Jan 15, 2024	Knowledge Graph EmbeddingsKnowledge Graphs	CodeCode Available	0	5
LEAVS: An LLM-based Labeler for Abdominal CT Supervision	Mar 17, 2025	AnatomyLarge Language Model	CodeCode Available	0	5
Length Optimization in Conformal Prediction	Jun 27, 2024	Conformal PredictionLanguage Modeling	CodeCode Available	0	5
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available	0	5
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available	0	5
Learning to Reuse Distractors to support Multiple Choice Question Generation in Education	Oct 25, 2022	Multiple-choiceQuestion Generation	CodeCode Available	0	5
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers	Oct 15, 2024	Multiple-choice	CodeCode Available	0	5

Show:10 25 50

← PrevPage 29 of 111Next →

No leaderboard results yet.