The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models May 30, 2025 Hallucination Mathematical Reasoning
Code Code Available 1ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving May 30, 2025 Autonomous Driving Decision Making
— Unverified 0ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models May 30, 2025 Reinforcement Learning (RL)
Code Code Available 5Contextual Integrity in LLMs via Reasoning and Reinforcement Learning May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation May 29, 2025 Form Hallucination
— Unverified 0ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning May 29, 2025 Denoising MuJoCo
— Unverified 0Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning May 29, 2025 Deep Reinforcement Learning MuJoCo
— Unverified 0Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models May 29, 2025 2k 4k
Code Code Available 1Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models May 29, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0Hybrid Cross-domain Robust Reinforcement Learning May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Diversity-Aware Policy Optimization for Large Language Model Reasoning May 29, 2025 Diversity Language Modeling
— Unverified 0Normalizing Flows are Capable Models for RL May 29, 2025 Imitation Learning Reinforcement Learning (RL)
Code Code Available 1Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability May 29, 2025 Math Mathematical Reasoning
— Unverified 0Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners May 29, 2025 Humanoid Control Language Modeling
— Unverified 0Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles May 29, 2025 Reinforcement Learning (RL)
Code Code Available 1LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin May 29, 2025 GPU Reinforcement Learning (RL)
— Unverified 0Grounded Reinforcement Learning for Visual Reasoning May 29, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering May 29, 2025 Large Language Model Prompt Engineering
Code Code Available 2Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization May 29, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering May 29, 2025 Reinforcement Learning (RL)
Code Code Available 1Grower-in-the-Loop Interactive Reinforcement Learning for Greenhouse Climate Control May 29, 2025 Reinforcement Learning (RL)
— Unverified 0Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization May 29, 2025 Reinforcement Learning (RL)
— Unverified 0DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes May 29, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Unsupervised Transcript-assisted Video Summarization and Highlight Detection May 29, 2025 Highlight Detection Reinforcement Learning (RL)
— Unverified 0SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning May 28, 2025 Offline RL reinforcement-learning
Code Code Available 0Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start May 28, 2025 Math Multimodal Reasoning
Code Code Available 1Maximizing Confidence Alone Improves Reasoning May 28, 2025 GSM8K Math
— Unverified 0Decomposing Elements of Problem Solving: What "Math" Does RL Teach? May 28, 2025 Math Mathematical Problem-Solving
Code Code Available 0Scaling Offline RL via Efficient and Expressive Shortcut Models May 28, 2025 Offline RL reinforcement-learning
— Unverified 0When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? May 28, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning May 28, 2025 Image Segmentation Multimodal Reasoning
— Unverified 0Enhancing Study-Level Inference from Clinical Trial Papers via RL-based Numeric Reasoning May 28, 2025 Reinforcement Learning (RL)
— Unverified 0Skywork Open Reasoner 1 Technical Report May 28, 2025 Math Reinforcement Learning (RL)
Code Code Available 4Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO May 28, 2025 Math Reinforcement Learning (RL)
Code Code Available 2cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning May 28, 2025 CAD Reconstruction Large Language Model
Code Code Available 2Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games May 28, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning May 28, 2025 Denoising Reinforcement Learning (RL)
— Unverified 0A Provable Approach for End-to-End Safe Reinforcement Learning May 28, 2025 Gaussian Processes Reinforcement Learning (RL)
— Unverified 0FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control May 28, 2025 GPU Humanoid Control
— Unverified 0HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym May 28, 2025 OpenAI Gym Reinforcement Learning (RL)
Code Code Available 0MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding May 27, 2025 Reinforcement Learning (RL) Video Understanding
Code Code Available 1Rendering-Aware Reinforcement Learning for Vector Graphics Generation May 27, 2025 Code Generation reinforcement-learning
— Unverified 0Reinforcing General Reasoning without Verifiers May 27, 2025 Math Mathematical Reasoning
Code Code Available 2SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution May 27, 2025 Reinforcement Learning (RL)
Code Code Available 2Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning May 27, 2025 Decision Making Deep Reinforcement Learning
— Unverified 0R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning May 27, 2025 Code Generation Reinforcement Learning (RL)
Code Code Available 1Interactive OT Gym: A Reinforcement Learning-Based Interactive Optical tweezer (OT)-Driven Microrobotics Simulation Platform May 27, 2025 Reinforcement Learning (RL)
— Unverified 0Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies May 27, 2025 Protein Design Reinforcement Learning (RL)
— Unverified 0Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL May 26, 2025 Reinforcement Learning (RL) Specificity
Code Code Available 1