SOTAVerified

Decision Making

Papers

Showing 291300 of 12311 papers

TitleStatusHype
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language ModelsCode1
Online Intrinsic Rewards for Decision Making Agents from Large Language Model FeedbackCode1
DiffLight: A Partial Rewards Conditioned Diffusion Model for Traffic Signal Control with Missing DataCode1
Toward Conditional Distribution Calibration in Survival PredictionCode1
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
Reflection-Bench: probing AI intelligence with reflectionCode1
A Comprehensive Evaluation of Cognitive Biases in LLMsCode1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationCode1
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web NavigationCode1
Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement LearningCode1
Show:102550
← PrevPage 30 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified