SOTAVerified

Decision Making

Papers

Showing 51100 of 12311 papers

TitleStatusHype
A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-MakingCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
Sentiment Reasoning for HealthcareCode3
Reinforcement Learning Meets Visual OdometryCode3
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
Evolve Cost-aware Acquisition Functions Using Large Language ModelsCode3
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-MakingCode3
Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in PythonCode3
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision MakingCode3
Behavior Generation with Latent ActionsCode3
Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingCode3
UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal PredictionCode3
SPO: Sequential Monte Carlo Policy OptimisationCode3
V-IRL: Grounding Virtual Intelligence in Real LifeCode3
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language ModelsCode3
Evaluating Language Model Agency through NegotiationsCode3
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
Planning with Diffusion for Flexible Behavior SynthesisCode3
Attention is not not ExplanationCode3
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous EnvironmentsCode2
CausalPFN: Amortized Causal Effect Estimation via In-Context LearningCode2
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningCode2
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A SurveyCode2
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language ModelsCode2
Agentic Knowledgeable Self-awarenessCode2
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree SearchCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
V-Max: A Reinforcement Learning Framework for Autonomous DrivingCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesCode2
What Makes a Good Diffusion Planner for Decision Making?Code2
Digital Player: Evaluating Large Language Models based Human-like Agent in GamesCode2
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision SupportCode2
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First TimeCode2
On the Guidance of Flow MatchingCode2
OptiChat: Bridging Optimization Models and Practitioners with Large Language ModelsCode2
LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process ThinkingCode2
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
Mechanistic understanding and validation of large AI models with SemanticLensCode2
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward ModelsCode2
LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language ModelsCode2
GaussianAD: Gaussian-Centric End-to-End Autonomous DrivingCode2
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM ReasoningCode2
Doe-1: Closed-Loop Autonomous Driving with Large World ModelCode2
GPD-1: Generative Pre-training for DrivingCode2
A Comprehensive Guide to Explainable AI: From Classical Models to LLMsCode2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AICode2
Natural Language Reinforcement LearningCode2
Disentangling Memory and Reasoning Ability in Large Language ModelsCode2
Show:102550
← PrevPage 2 of 247Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SRLAAverage Remaining Cycles6.4Unverified