SOTAVerified

Sequential Decision Making

Papers

Showing 150 of 1210 papers

TitleStatusHype
Multi-Agent Reinforcement Learning for Autonomous Driving: A SurveyCode5
Reflexion: Language Agents with Verbal Reinforcement LearningCode4
Eureka: Human-Level Reward Design via Coding Large Language ModelsCode4
Reinforcement Learning Meets Visual OdometryCode3
MineStudio: A Streamlined Package for Minecraft AI Agent DevelopmentCode3
Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsCode2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentCode2
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-DependencyCode2
Multi-Agent Reinforcement Learning is a Sequence Modeling ProblemCode2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlowCode2
STEVE-1: A Generative Model for Text-to-Behavior in MinecraftCode2
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPOCode2
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency TradingCode2
Dungeons and Data: A Large-Scale NetHack DatasetCode2
Pre-Trained Language Models for Interactive Decision-MakingCode2
Learning Dynamic Belief Graphs to Generalize on Text-Based GamesCode1
Layered and Staged Monte Carlo Tree Search for SMT Strategy SynthesisCode1
Learning Multi-Level Hierarchies with HindsightCode1
Large Language Model as a Policy Teacher for Training Reinforcement Learning AgentsCode1
CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in CoqCode1
IQ-Learn: Inverse soft-Q Learning for ImitationCode1
Large Language Models for Planning: A Comprehensive and Systematic SurveyCode1
Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control ProblemCode1
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy OptimizationCode1
LLF-Bench: Benchmark for Interactive Learning from Language FeedbackCode1
Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?Code1
Bridging POMDPs and Bayesian decision making for robust maintenance planning under model uncertainty: An application to railway systemsCode1
Learning Discrete World Models for Heuristic SearchCode1
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Efficient Symptom Inquiring and Diagnosis via Adaptive Alignment of Reinforcement Learning and ClassificationCode1
Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step TreesCode1
Extracting Reward Functions from Diffusion ModelsCode1
Dynamic Causal Bayesian OptimizationCode1
Deep Reinforcement Learning for Entity AlignmentCode1
Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal ConstraintsCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Decision Stacks: Flexible Reinforcement Learning via Modular Generative ModelsCode1
AdaPlanner: Adaptive Planning from Feedback with Language ModelsCode1
Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision MakingCode1
An Alternative Softmax Operator for Reinforcement LearningCode1
An empirical evaluation of active inference in multi-armed banditsCode1
Effective Reinforcement Learning through Evolutionary Surrogate-Assisted PrescriptionCode1
Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value FunctionsCode1
DataEnvGym: Data Generation Agents in Teacher Environments with Student FeedbackCode1
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning ApproachCode1
Adaptive Stress Testing of Trajectory Predictions in Flight Management SystemsCode1
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop FeedbackCode1
Hybrid Multi-agent Deep Reinforcement Learning for Autonomous Mobility on Demand SystemsCode1
Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State SpacesCode1
Show:102550
← PrevPage 1 of 25Next →

No leaderboard results yet.