SOTAVerified

Sequential Decision Making

Papers

Showing 150 of 1210 papers

TitleStatusHype
Multi-Agent Reinforcement Learning for Autonomous Driving: A SurveyCode5
Eureka: Human-Level Reward Design via Coding Large Language ModelsCode4
Reflexion: Language Agents with Verbal Reinforcement LearningCode4
MineStudio: A Streamlined Package for Minecraft AI Agent DevelopmentCode3
Reinforcement Learning Meets Visual OdometryCode3
Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsCode2
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPOCode2
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency TradingCode2
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future DirectionsCode2
Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentCode2
STEVE-1: A Generative Model for Text-to-Behavior in MinecraftCode2
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlowCode2
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-DependencyCode2
Dungeons and Data: A Large-Scale NetHack DatasetCode2
Multi-Agent Reinforcement Learning is a Sequence Modeling ProblemCode2
Pre-Trained Language Models for Interactive Decision-MakingCode2
Large Language Models for Planning: A Comprehensive and Systematic SurveyCode1
LLINBO: Trustworthy LLM-in-the-Loop Bayesian OptimizationCode1
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit TasksCode1
On Generalization Across Environments In Multi-Objective Reinforcement LearningCode1
Reinforcement learning with combinatorial actions for coupled restless banditsCode1
Training a Generally Curious AgentCode1
Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning PoliciesCode1
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban SimulationCode1
DataEnvGym: Data Generation Agents in Teacher Environments with Student FeedbackCode1
Learning Discrete World Models for Heuristic SearchCode1
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt TuningCode1
Re-ReST: Reflection-Reinforced Self-Training for Language AgentsCode1
Pursuing Overall Welfare in Federated Learning through Sequential Decision MakingCode1
Rethinking Transformers in Solving POMDPsCode1
Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State SpacesCode1
Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and TransformerCode1
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned DecisionCode1
How Can LLM Guide RL? A Value-Based ApproachCode1
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in ControlCode1
Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive LossCode1
Sym-Q: Adaptive Symbolic Regression via Sequential Decision-MakingCode1
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable SkillsCode1
Layered and Staged Monte Carlo Tree Search for SMT Strategy SynthesisCode1
LLF-Bench: Benchmark for Interactive Learning from Language FeedbackCode1
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI GymCode1
Large Language Model as a Policy Teacher for Training Reinforcement Learning AgentsCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Out of the Cage: How Stochastic Parrots Win in Cyber Security EnvironmentsCode1
Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop FeedbackCode1
ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource AllocationCode1
Sampling from Gaussian Process Posteriors using Stochastic Gradient DescentCode1
Simplified Temporal Consistency Reinforcement LearningCode1
Decision Stacks: Flexible Reinforcement Learning via Modular Generative ModelsCode1
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning ApproachCode1
Show:102550
← PrevPage 1 of 25Next →

No leaderboard results yet.