Co-Reinforcement Learning for Unified Multimodal Understanding and Generation May 23, 2025 Image Generation reinforcement-learning
Code Code Available 1Reinforcement Learning for Ballbot Navigation in Uneven Terrain May 23, 2025 MuJoCo reinforcement-learning
Code Code Available 1The Cell Must Go On: Agar.io for Continual Reinforcement Learning May 23, 2025 Continual Learning Deep Reinforcement Learning
Code Code Available 1Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL May 22, 2025 Natural Language Understanding Reinforcement Learning (RL)
Code Code Available 3RAP: Runtime-Adaptive Pruning for LLM Inference May 22, 2025 Reinforcement Learning (RL)
— Unverified 0Backdoors in DRL: Four Environments Focusing on In-distribution Triggers May 22, 2025 Backdoor Attack Data Poisoning
— Unverified 0Control of Renewable Energy Communities using AI and Real-World Data May 22, 2025 Data Integration energy management
— Unverified 0DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation May 22, 2025 Language Modeling Language Modelling
— Unverified 0SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward May 22, 2025 Reinforcement Learning (RL)
Code Code Available 2Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO May 22, 2025 Domain Generalization Image Generation
Code Code Available 4LARES: Latent Reasoning for Sequential Recommendation May 22, 2025 Recommendation Systems Reinforcement Learning (RL)
— Unverified 0Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models May 22, 2025 Reinforcement Learning (RL)
Code Code Available 1AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning May 22, 2025 Math reinforcement-learning
— Unverified 0Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains May 22, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO May 22, 2025 Reinforcement Learning (RL)
Code Code Available 3Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning May 22, 2025 Reinforcement Learning (RL)
— Unverified 0SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning May 22, 2025 Language Modeling Language Modelling
Code Code Available 0SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation May 22, 2025 Machine Translation Reinforcement Learning (RL)
Code Code Available 0Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning May 22, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0Reinforcement Learning for Stock Transactions May 22, 2025 Q-Learning reinforcement-learning
— Unverified 0PyTupli: A Scalable Infrastructure for Collaborative Offline Reinforcement Learning Projects May 22, 2025 Offline RL Reinforcement Learning (RL)
Code Code Available 0Reward-Aware Proto-Representations in Reinforcement Learning May 22, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning May 22, 2025 Math Reinforcement Learning (RL)
Code Code Available 2Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models May 22, 2025 Reinforcement Learning (RL)
Code Code Available 2Strategically Linked Decisions in Long-Term Planning and Reinforcement Learning May 22, 2025 Reinforcement Learning (RL)
— Unverified 0Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning May 22, 2025 Reinforcement Learning (RL)
— Unverified 0Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games May 22, 2025 Reinforcement Learning (RL)
— Unverified 0ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay May 22, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 2Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning May 22, 2025 Reinforcement Learning (RL)
Code Code Available 3Meta-reinforcement learning with minimum attention May 22, 2025 Meta-Learning Meta Reinforcement Learning
— Unverified 0Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) May 22, 2025 Autonomous Driving Bench2Drive
— Unverified 0SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development May 22, 2025 Bug fixing Chatbot
Code Code Available 2Find the Fruit: Designing a Zero-Shot Sim2Real Deep RL Planner for Occlusion Aware Plant Manipulation May 22, 2025 Deep Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving May 22, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only May 22, 2025 Imitation Learning Offline RL
— Unverified 0Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies May 22, 2025 Offline RL Q-Learning
— Unverified 0Reward Is Enough: LLMs Are In-Context Reinforcement Learners May 21, 2025 Large Language Model Reinforcement Learning (RL)
— Unverified 0Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning May 21, 2025 Reinforcement Learning (RL) Visual Reasoning
— Unverified 0GRIT: Teaching MLLMs to Think with Images May 21, 2025 Reinforcement Learning (RL) Visual Reasoning
— Unverified 0RLBenchNet: The Right Network for the Right Reinforcement Learning Task May 21, 2025 continuous-control Continuous Control
Code Code Available 1From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning May 21, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 1Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One May 21, 2025 Model Selection Reinforcement Learning (RL)
— Unverified 0MMaDA: Multimodal Large Diffusion Language Models May 21, 2025 Image Generation Reinforcement Learning (RL)
Code Code Available 0An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents May 21, 2025 Reinforcement Learning (RL)
Code Code Available 7VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL May 21, 2025 Reinforcement Learning (RL)
— Unverified 0StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization May 21, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 0Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities May 21, 2025 Math Reinforcement Learning (RL)
— Unverified 0ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning May 21, 2025 Pseudo Label Reinforcement Learning (RL)
— Unverified 0STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs May 21, 2025 Efficient Exploration Reinforcement Learning (RL)
Code Code Available 0Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives May 21, 2025 Reinforcement Learning (RL)
— Unverified 0