SOTAVerified

Multiple-choice

Papers

Showing 281290 of 1107 papers

TitleStatusHype
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Length Optimization in Conformal PredictionCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Learning to Reuse Distractors to support Multiple Choice Question Generation in EducationCode0
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
Show:102550
← PrevPage 29 of 111Next →

No leaderboard results yet.