SOTAVerified

Multiple-choice

Papers

Showing 176200 of 1107 papers

TitleStatusHype
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models0
A Semantic Parsing Algorithm to Solve Linear Ordering Problems0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
Investigating the Shortcomings of LLMs in Step-by-Step Legal ReasoningCode0
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and ReasoningCode0
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs0
LLMs to Support a Domain Specific Knowledge Assistant0
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic ScenesCode1
Evalita-LLM: Benchmarking Large Language Models on Italian0
The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts0
CoddLLM: Empowering Large Language Models for Data Analytics0
InnerThoughts: Disentangling Representations and Predictions in Large Language Models0
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction0
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection0
Attribution analysis of legal language as used by LLM0
Options-Aware Dense Retrieval for Multiple-Choice query Answering0
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI0
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering0
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion0
Option-ID Based Elimination For Multiple Choice QuestionsCode0
Humanity's Last Exam0
Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources0
Show:102550
← PrevPage 8 of 45Next →

No leaderboard results yet.