SOTAVerified

Decision Making

Papers

Showing 76100 of 12311 papers

TitleStatusHype
Agentic Knowledgeable Self-awarenessCode2
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree SearchCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
V-Max: A Reinforcement Learning Framework for Autonomous DrivingCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
What Makes a Good Diffusion Planner for Decision Making?Code2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision SupportCode2
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First TimeCode2
On the Guidance of Flow MatchingCode2
OptiChat: Bridging Optimization Models and Practitioners with Large Language ModelsCode2
LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process ThinkingCode2
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
Mechanistic understanding and validation of large AI models with SemanticLensCode2
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward ModelsCode2
LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language ModelsCode2
GaussianAD: Gaussian-Centric End-to-End Autonomous DrivingCode2
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM ReasoningCode2
Doe-1: Closed-Loop Autonomous Driving with Large World ModelCode2
GPD-1: Generative Pre-training for DrivingCode2
A Comprehensive Guide to Explainable AI: From Classical Models to LLMsCode2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AICode2
Natural Language Reinforcement LearningCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Show:102550
← PrevPage 4 of 493Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified