DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Jan 22, 2025 Mathematical Reasoning Multi-task Language Understanding
Code Code Available 155 Introduction to Reinforcement Learning Aug 13, 2024 reinforcement-learning Reinforcement Learning
Code Code Available 115 Gymnasium: A Standard Interface for Reinforcement Learning Environments Jul 24, 2024 reinforcement-learning Reinforcement Learning
Code Code Available 115 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Apr 10, 2025 Language Modeling Language Modelling
Code Code Available 95 SkyReels-V2: Infinite-length Film Generative Model Apr 17, 2025 Large Language Model model
Code Code Available 95 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model May 7, 2024 Language Modeling Language Modelling
Code Code Available 95 TTRL: Test-Time Reinforcement Learning Apr 22, 2025 Math reinforcement-learning
Code Code Available 75 MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Jun 16, 2025 Mixture-of-Experts Reinforcement Learning (RL)
Code Code Available 75 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Feb 20, 2025 Math reinforcement-learning
Code Code Available 75 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning May 30, 2025 GPU Math
Code Code Available 75 EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning Jan 25, 2025 Benchmarking Evolutionary Algorithms
Code Code Available 75 RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning Apr 24, 2025 Decision Making Reinforcement Learning (RL)
Code Code Available 75 Kimi k1.5: Scaling Reinforcement Learning with LLMs Jan 22, 2025 Math reinforcement-learning
Code Code Available 75 Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Mar 12, 2025 Question Answering RAG
Code Code Available 75 Flow-GRPO: Training Flow Matching Models via Online RL May 8, 2025 Denoising Diversity
Code Code Available 75 SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Mar 24, 2025 Instruction Following Math
Code Code Available 75 An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents May 21, 2025 Reinforcement Learning (RL)
Code Code Available 75 FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning Nov 6, 2022 Deep Reinforcement Learning reinforcement-learning
Code Code Available 65 The Dormant Neuron Phenomenon in Deep Reinforcement Learning Feb 24, 2023 Deep Reinforcement Learning reinforcement-learning
Code Code Available 65 RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism Jun 30, 2025 Question Answering RAG
Code Code Available 55 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models May 30, 2025 Reinforcement Learning (RL)
Code Code Available 55 Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation May 31, 2024 MuJoCo reinforcement-learning
Code Code Available 55 Process Reinforcement through Implicit Rewards Feb 3, 2025 Math Reinforcement Learning (RL)
Code Code Available 55 Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments Jan 10, 2023 GPU Imitation Learning
Code Code Available 55 ZeroSearch: Incentivize the Search Capability of LLMs without Searching May 7, 2025 Reinforcement Learning (RL) Retrieval
Code Code Available 55 EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine Jun 21, 2022 MuJoCo reinforcement-learning
Code Code Available 55 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Mar 9, 2025 Math Multimodal Reasoning
Code Code Available 55 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Jun 23, 2025 Reinforcement Learning (RL) Text Generation
Code Code Available 55 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Nov 21, 2024 Reinforcement Learning (RL)
Code Code Available 55 Understanding R1-Zero-Like Training: A Critical Perspective Mar 26, 2025 Reinforcement Learning (RL)
Code Code Available 55 Kimi-VL Technical Report Apr 10, 2025 Long-Context Understanding Mathematical Reasoning
Code Code Available 55 DanceGRPO: Unleashing GRPO on Visual Generation May 12, 2025 Denoising reinforcement-learning
Code Code Available 55 Group-in-Group Policy Optimization for LLM Agent Training May 16, 2025 GPU Mathematical Reasoning
Code Code Available 55 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Dec 25, 2024 Reinforcement Learning (RL)
Code Code Available 55 Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer Apr 8, 2024 MuJoCo Physical Simulations
Code Code Available 55 Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey Aug 19, 2024 Autonomous Driving Decision Making
Code Code Available 55 SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models Jun 15, 2025 Logical Reasoning Reinforcement Learning (RL)
Code Code Available 55 Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation May 2, 2024 MuJoCo Reinforcement Learning (RL)
Code Code Available 55 Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Mar 20, 2025 Decision Making Language Modeling
Code Code Available 45 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning May 23, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 45 R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Mar 7, 2025 RAG Reinforcement Learning (RL)
Code Code Available 45 Ray: A Distributed Framework for Emerging AI Applications Dec 16, 2017 reinforcement-learning Reinforcement Learning
Code Code Available 45 Pearl: A Production-ready Reinforcement Learning Agent Dec 6, 2023 Benchmarking reinforcement-learning
Code Code Available 45 RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem Nov 25, 2020 reinforcement-learning Reinforcement Learning
Code Code Available 45 Discovering faster matrix multiplication algorithms with reinforcement learning Oct 5, 2022 Deep Reinforcement Learning reinforcement-learning
Code Code Available 45 RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark Jun 29, 2023 Combinatorial Optimization Computational Efficiency
Code Code Available 45 Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO May 22, 2025 Domain Generalization Image Generation
Code Code Available 45 DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality Oct 25, 2022 Deep Reinforcement Learning GPU
Code Code Available 45 DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning Feb 28, 2025 Information Retrieval reinforcement-learning
Code Code Available 45 DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments Apr 4, 2025 Navigate Prompt Engineering
Code Code Available 45