SOTAVerified

Sequential Decision Making

Papers

Showing 251300 of 1210 papers

TitleStatusHype
"Give Me an Example Like This": Episodic Active Reinforcement Learning from DemonstrationsCode0
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability ObjectivesCode0
Rectifying Reinforcement Learning for Reward Matching0
Re-ReST: Reflection-Reinforced Self-Training for Language AgentsCode1
Combining Experimental and Historical Data for Policy EvaluationCode0
Reward Machines for Deep RL in Noisy and Uncertain EnvironmentsCode0
Pursuing Overall Welfare in Federated Learning through Sequential Decision MakingCode1
Low-rank finetuning for LLMs: A fairness perspective0
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators0
Leveraging Offline Data in Linear Latent Bandits0
Rethinking Transformers in Solving POMDPsCode1
Variational Offline Multi-agent Skill Discovery0
Inference of Utilities and Time Preference in Sequential Decision-Making0
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning0
Reinforcing Language Agents via Policy Optimization with Action Decomposition0
Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality0
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making0
A finite time analysis of distributed Q-learning0
FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear BanditsCode0
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models0
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback0
Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product SearchCode0
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?Code0
CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System0
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments0
Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI0
Learning Planning Abstractions from Language0
Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows0
Enhancing Q-Learning with Large Language Model Heuristics0
MEXGEN: An Effective and Efficient Information Gain Approximation for Information Gathering Path Planning0
Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery0
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback0
Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks0
Q-learning with temporal memory to navigate turbulence0
Digital Twins for forecasting and decision optimisation with machine learning: applications in wastewater treatment0
What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement LearningCode0
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation0
Sequential Decision Making with Expert Demonstrations under Unobserved HeterogeneityCode0
Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and DetectionCode0
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery0
Multi-Agent Soft Actor-Critic with Coordinated Loss for Autonomous Mobility-on-Demand Fleet ControlCode0
Deep Reinforcement Learning for Personalized Diagnostic Decision Pathways Using Electronic Health Records: A Comparative Study on Anemia and Systemic Lupus ErythematosusCode0
Regularized Conditional Diffusion Model for Multi-Task Preference Alignment0
Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks0
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models0
Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State SpacesCode1
Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems0
Mixed-Initiative Human-Robot Teaming under Suboptimality with Online Bayesian AdaptationCode0
Continual Vision-and-Language Navigation0
Sequential Decision-Making for Inline Text Autocomplete0
Show:102550
← PrevPage 6 of 25Next →

No leaderboard results yet.