SOTAVerified

Decision Making

Papers

Showing 91100 of 12311 papers

TitleStatusHype
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward ModelsCode2
LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language ModelsCode2
GaussianAD: Gaussian-Centric End-to-End Autonomous DrivingCode2
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM ReasoningCode2
Doe-1: Closed-Loop Autonomous Driving with Large World ModelCode2
GPD-1: Generative Pre-training for DrivingCode2
A Comprehensive Guide to Explainable AI: From Classical Models to LLMsCode2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AICode2
Natural Language Reinforcement LearningCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Show:102550
← PrevPage 10 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified