Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Jun 3, 2025 GPU Math
— Unverified 0Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Jun 3, 2025 Reinforcement Learning (RL)
— Unverified 0Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains Jun 2, 2025 Math Reinforcement Learning (RL)
— Unverified 0Trajectory First: A Curriculum for Discovering Diverse Policies Jun 2, 2025 Diversity Reinforcement Learning (RL)
— Unverified 0KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning Jun 2, 2025 Knowledge Distillation Large Language Model
— Unverified 0SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Jun 2, 2025 Multimodal Reasoning reinforcement-learning
— Unverified 0Data-assimilated model-informed reinforcement learning Jun 2, 2025 model reinforcement-learning
— Unverified 0A Reinforcement Learning Approach for RIS-aided Fair Communications Jun 1, 2025 Fairness reinforcement-learning
— Unverified 0DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving Jun 1, 2025 Autonomous Driving Decoder
— Unverified 0MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning May 31, 2025 Diagnostic Reinforcement Learning (RL)
— Unverified 0ARIA: Training Language Agents with Intention-Driven Reward Aggregation May 31, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Reinforcement Learning for Hanabi May 31, 2025 Card Games Deep Reinforcement Learning
— Unverified 0How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning May 30, 2025 ARC Reinforcement Learning (RL)
— Unverified 0Balancing Profit and Fairness in Risk-Based Pricing Markets May 30, 2025 Fairness Reinforcement Learning (RL)
— Unverified 0Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation May 30, 2025 Reinforcement Learning (RL) Vector Graphics
— Unverified 0Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models May 30, 2025 Math Multiple-choice
Code Code Available 0ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving May 30, 2025 Autonomous Driving Decision Making
— Unverified 0MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models May 30, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning May 30, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control May 30, 2025 continuous-control Continuous Control
— Unverified 0Diversity-Aware Policy Optimization for Large Language Model Reasoning May 29, 2025 Diversity Language Modeling
— Unverified 0Grounded Reinforcement Learning for Visual Reasoning May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation May 29, 2025 Form Hallucination
— Unverified 0LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin May 29, 2025 GPU Reinforcement Learning (RL)
— Unverified 0Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability May 29, 2025 Math Mathematical Reasoning
— Unverified 0Hybrid Cross-domain Robust Reinforcement Learning May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes May 29, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models May 29, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning May 29, 2025 Denoising MuJoCo
— Unverified 0Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning May 29, 2025 Deep Reinforcement Learning MuJoCo
— Unverified 0Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners May 29, 2025 Humanoid Control Language Modeling
— Unverified 0Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization May 29, 2025 Reinforcement Learning (RL)
— Unverified 0Contextual Integrity in LLMs via Reasoning and Reinforcement Learning May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control May 29, 2025 Reinforcement Learning (RL)
— Unverified 0Unsupervised Transcript-assisted Video Summarization and Highlight Detection May 29, 2025 Highlight Detection Reinforcement Learning (RL)
— Unverified 0SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning May 28, 2025 Image Segmentation Multimodal Reasoning
— Unverified 0FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control May 28, 2025 GPU Humanoid Control
— Unverified 0Scaling Offline RL via Efficient and Expressive Shortcut Models May 28, 2025 Offline RL reinforcement-learning
— Unverified 0ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning May 28, 2025 Denoising Reinforcement Learning (RL)
— Unverified 0Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games May 28, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0A Provable Approach for End-to-End Safe Reinforcement Learning May 28, 2025 Gaussian Processes Reinforcement Learning (RL)
— Unverified 0Enhancing Study-Level Inference from Clinical Trial Papers via RL-based Numeric Reasoning May 28, 2025 Reinforcement Learning (RL)
— Unverified 0HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym May 28, 2025 OpenAI Gym Reinforcement Learning (RL)
Code Code Available 0Maximizing Confidence Alone Improves Reasoning May 28, 2025 GSM8K Math
— Unverified 0Decomposing Elements of Problem Solving: What "Math" Does RL Teach? May 28, 2025 Math Mathematical Problem-Solving
Code Code Available 0SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning May 28, 2025 Offline RL reinforcement-learning
Code Code Available 0When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? May 28, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Rendering-Aware Reinforcement Learning for Vector Graphics Generation May 27, 2025 Code Generation reinforcement-learning
— Unverified 0