SOTAVerified

Multiple-choice

Papers

Showing 191200 of 1107 papers

TitleStatusHype
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction0
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection0
Attribution analysis of legal language as used by LLM0
Options-Aware Dense Retrieval for Multiple-Choice query Answering0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering0
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion0
Option-ID Based Elimination For Multiple Choice QuestionsCode0
Humanity's Last Exam0
Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources0
Show:102550
← PrevPage 20 of 111Next →

No leaderboard results yet.