SOTAVerified

Multiple-choice

Papers

Showing 401410 of 1107 papers

TitleStatusHype
LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning0
VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models0
Truth Knows No Language: Evaluating Truthfulness Beyond EnglishCode0
Objective quantification of mood states using large language models0
A Semantic Parsing Algorithm to Solve Linear Ordering Problems0
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
Show:102550
← PrevPage 41 of 111Next →

No leaderboard results yet.