SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 150 of 1918 papers

TitleStatusHype
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency PolicyCode3
Flow Q-LearningCode3
Streaming Deep Reinforcement Learning Finally WorksCode3
Simplifying Deep Temporal Difference LearningCode3
Digi-Q: Learning Q-Value Functions for Training Device-Control AgentsCode2
Pretrained LLM Adapted with LoRA as a Decision Transformer for Offline RL in Quantitative TradingCode2
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse WeatherCode2
Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous DrivingCode2
Ensembling Prioritized Hybrid Policies for Multi-agent PathfindingCode2
Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement LearningCode2
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-DependencyCode2
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement LearningCode2
Offline RL for Natural Language Generation with Implicit Language Q LearningCode2
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorchCode2
POPGym Arcade: Parallel Pixelated POMDPsCode1
Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone ConnectivityCode1
Reward-free World Models for Online Imitation LearningCode1
Reinforcement Learning in High-frequency Market MakingCode1
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting MitigationCode1
PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-PerformerCode1
Strategically Conservative Q-LearningCode1
Towards Universal and Black-Box Query-Response Only Attack on LLMs with QROACode1
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
A Recipe for Unbounded Data Augmentation in Visual Reinforcement LearningCode1
Research on Robot Path Planning Based on Reinforcement LearningCode1
Laser Learning Environment: A new environment for coordination-critical multi-agent tasksCode1
Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-errorCode1
Multi-Agent Reinforcement Learning via Distributed MPC as a Function ApproximatorCode1
Optimistic Multi-Agent Policy GradientCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Towards Robust Offline Reinforcement Learning under Diverse Data CorruptionCode1
Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissionsCode1
Boosting Continuous Control with Consistency PolicyCode1
PGDQN: Preference-Guided Deep Q-NetworkCode1
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement LearningCode1
Reasoning with Latent Diffusion in Offline Reinforcement LearningCode1
Robust Multi-Agent Reinforcement Learning with State UncertaintyCode1
MADiff: Offline Multi-agent Learning with Diffusion ModelsCode1
When should we prefer Decision Transformers for Offline Reinforcement Learning?Code1
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion PoliciesCode1
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value RegularizationCode1
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-TuningCode1
LS-IQ: Implicit Reward Regularization for Inverse Reinforcement LearningCode1
TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning ProblemsCode1
Extreme Q-Learning: MaxEnt RL without EntropyCode1
Learning a Generic Value-Selection Heuristic Inside a Constraint Programming SolverCode1
Solving Continuous Control via Q-learningCode1
Sustainable Online Reinforcement Learning for Auto-biddingCode1
Hybrid RL: Using Both Offline and Online Data Can Make RL EfficientCode1
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of TrialsCode1
Show:102550
← PrevPage 1 of 39Next →

No leaderboard results yet.