SOTAVerified

Multiple-choice

Papers

Showing 2130 of 1107 papers

TitleStatusHype
SEED-Bench: Benchmarking Multimodal Large Language ModelsCode3
Fine-Tuning Language Models with Just Forward PassesCode3
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsCode3
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-TuningCode2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement LearningCode2
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1Code2
Mellow: a small audio language model for reasoningCode2
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational BiologyCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
Show:102550
← PrevPage 3 of 111Next →

No leaderboard results yet.