R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO May 22, 2025 Reinforcement Learning (RL)
Code Code Available 35 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning May 5, 2025 Reinforcement Learning (RL)
Code Code Available 35 DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Apr 15, 2025 Mathematical Reasoning Reinforcement Learning (RL)
Code Code Available 35 Deep Reinforcement Learning Oct 15, 2018 Deep Reinforcement Learning Management
Code Code Available 35 Practical Deep Reinforcement Learning Approach for Stock Trading Nov 19, 2018 Deep Reinforcement Learning reinforcement-learning
Code Code Available 35 OpenSpiel: A Framework for Reinforcement Learning in Games Aug 26, 2019 General Reinforcement Learning reinforcement-learning
Code Code Available 35 Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Feb 5, 2024 reinforcement-learning Reinforcement Learning
Code Code Available 35 Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning May 22, 2025 Reinforcement Learning (RL)
Code Code Available 35 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning May 13, 2025 Reinforcement Learning (RL) Visual Reasoning
Code Code Available 35 Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning Feb 26, 2024 GPU Minecraft
Code Code Available 35 OGBench: Benchmarking Offline Goal-Conditioned RL Oct 26, 2024 Benchmarking reinforcement-learning
Code Code Available 35 o1-Coder: an o1 Replication for Coding Nov 29, 2024 Reinforcement Learning (RL)
Code Code Available 35 OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research May 16, 2023 Philosophy reinforcement-learning
Code Code Available 35 MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse Mar 24, 2025 Layout Generation Reinforcement Learning (RL)
Code Code Available 35 CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control Oct 4, 2024 Motion Generation Reinforcement Learning (RL)
Code Code Available 35 CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms Nov 16, 2021 Benchmarking Deep Reinforcement Learning
Code Code Available 35 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Mar 3, 2025 Reinforcement Learning (RL)
Code Code Available 35 Adversarial Cheap Talk Nov 20, 2022 Meta-Learning Reinforcement Learning (RL)
Code Code Available 35 On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning Nov 10, 2021 Multi-agent Reinforcement Learning reinforcement-learning
Code Code Available 35 A Clean Slate for Offline Reinforcement Learning Apr 15, 2025 Offline RL reinforcement-learning
Code Code Available 35 Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms Jan 22, 2024 Evolutionary Algorithms reinforcement-learning
Code Code Available 35 MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library Oct 11, 2022 Multi-agent Reinforcement Learning reinforcement-learning
Code Code Available 35 ACEGEN: Reinforcement learning of generative chemical agents for drug discovery May 7, 2024 Benchmarking Decision Making
Code Code Available 35 Learning Bipedal Walking for Humanoids with Current Feedback Mar 7, 2023 Deep Reinforcement Learning Reinforcement Learning (RL)
Code Code Available 35 Learning Bipedal Walking On Planned Footsteps For Humanoid Robots Jul 26, 2022 Deep Reinforcement Learning MuJoCo
Code Code Available 35 Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning Jan 26, 2023 Benchmarking Deep Reinforcement Learning
Code Code Available 35 imitation: Clean Imitation Learning Implementations Nov 22, 2022 Imitation Learning reinforcement-learning
Code Code Available 35 Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward May 18, 2025 GPU Graph Matching
Code Code Available 35 ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Feb 29, 2024 Language Modeling Language Modelling
Code Code Available 35 Accelerating Goal-Conditioned RL Algorithms and Research Aug 20, 2024 GPU reinforcement-learning
Code Code Available 35 Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL May 22, 2025 Natural Language Understanding Reinforcement Learning (RL)
Code Code Available 35 Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning Oct 11, 2022 reinforcement-learning Reinforcement Learning
Code Code Available 35 Perception-R1: Pioneering Perception Policy with Reinforcement Learning Apr 10, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 35 FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation May 22, 2023 Imitation Learning Motion Planning
Code Code Available 25 Foundation Policies with Hilbert Representations Feb 23, 2024 Reinforcement Learning (RL) Unsupervised Pre-training
Code Code Available 25 G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning May 19, 2025 Language Modeling Language Modelling
Code Code Available 25 FlowReasoner: Reinforcing Query-Level Meta-Agents Apr 21, 2025 Reinforcement Learning (RL)
Code Code Available 25 Generalized Inner Loop Meta-Learning Oct 3, 2019 Meta-Learning reinforcement-learning
Code Code Available 25 FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation Apr 19, 2024 Decoder Network Embedding
Code Code Available 25 Flightmare: A Flexible Quadrotor Simulator Sep 1, 2020 Deep Reinforcement Learning reinforcement-learning
Code Code Available 25 FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance Dec 13, 2021 Deep Reinforcement Learning GPU
Code Code Available 25 Flow: A Modular Learning Framework for Mixed Autonomy Traffic Oct 16, 2017 Autonomous Vehicles Deep Reinforcement Learning
Code Code Available 25 Smooth Exploration for Robotic Reinforcement Learning May 12, 2020 continuous-control Continuous Control
Code Code Available 25 Fiber: A Platform for Efficient Development and Distributed Training for Reinforcement Learning and Population-Based Methods Mar 25, 2020 Distributed Computing Reinforcement Learning
Code Code Available 25 Feedback Efficient Online Fine-Tuning of Diffusion Models Feb 26, 2024 reinforcement-learning Reinforcement Learning
Code Code Available 25 Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design Oct 17, 2024 Protein Design Reinforcement Learning (RL)
Code Code Available 25 Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Feb 10, 2025 Math Mathematical Reasoning
Code Code Available 25 AndroidEnv: A Reinforcement Learning Platform for Android May 27, 2021 reinforcement-learning Reinforcement Learning
Code Code Available 25 EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking Apr 2, 2024 Benchmarking Reinforcement Learning (RL)
Code Code Available 25 Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning Mar 19, 2024 Inductive Bias Reinforcement Learning (RL)
Code Code Available 25