SOTAVerified

Decision Making

Papers

Showing 3140 of 12311 papers

TitleStatusHype
Constitutional AI: Harmlessness from AI FeedbackCode4
AgentBench: Evaluating LLMs as AgentsCode4
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement LearningCode4
Cognitive Architectures for Language AgentsCode4
Mastering Diverse Domains through World ModelsCode4
pgmpy: A Python Toolkit for Bayesian NetworksCode4
Behavior Generation with Latent ActionsCode3
FlashDepth: Real-time Streaming Video Depth Estimation at 2K ResolutionCode3
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language ModelsCode3
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
Show:102550
← PrevPage 4 of 1232Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified