SOTAVerified

Multiple-choice

Papers

Showing 421430 of 1107 papers

TitleStatusHype
Attribution analysis of legal language as used by LLM0
Options-Aware Dense Retrieval for Multiple-Choice query Answering0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering0
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion0
Option-ID Based Elimination For Multiple Choice QuestionsCode0
Humanity's Last Exam0
On the Reasoning Capacity of AI Models and How to Quantify It0
Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources0
Patent Figure Classification using Large Vision-language ModelsCode0
Show:102550
← PrevPage 43 of 111Next →

No leaderboard results yet.