SOTAVerified

Multiple-choice

Papers

Showing 4150 of 1107 papers

TitleStatusHype
My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals0
Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions0
CP-Router: An Uncertainty-Aware Router Between LLM and LRM0
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response0
BnMMLU: Measuring Massive Multitask Language Understanding in BengaliCode0
Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning0
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
KoBALT: Korean Benchmark For Advanced Linguistic Tasks0
Collaboration among Multiple Large Language Models for Medical Question Answering0
AutoMCQ -- Automatically Generate Code Comprehension Questions using GenAI0
Show:102550
← PrevPage 5 of 111Next →

No leaderboard results yet.