From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning Jul 17, 2025 D4RL Offline RL
— Unverified 0Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback Jul 17, 2025 EEG MuJoCo
— Unverified 0QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation Jul 17, 2025 Math Reinforcement Learning (RL)
— Unverified 0Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Jul 17, 2025 Language Modeling Language Modelling
— Unverified 0VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks Jul 17, 2025 Math Mathematical Reasoning
— Unverified 0Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) Jul 17, 2025 continuous-control Continuous Control
— Unverified 0Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training Jul 16, 2025 Code Generation Math
— Unverified 0Kevin: Multi-Turn RL for Generating CUDA Kernels Jul 16, 2025 GPU Reinforcement Learning (RL)
— Unverified 0Fly, Fail, Fix: Iterative Game Repair with Reinforcement Learning and Large Multimodal Models Jul 16, 2025 Game Design Reinforcement Learning (RL)
— Unverified 0Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing Jul 15, 2025 Knowledge Tracing Math
Code Code Available 0Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning Jul 15, 2025 Policy Gradient Methods reinforcement-learning
— Unverified 0Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities Jul 15, 2025 Reinforcement Learning (RL)
Code Code Available 0Illuminating the Three Dogmas of Reinforcement Learning under Evolutionary Light Jul 15, 2025 Reinforcement Learning (RL)
— Unverified 0High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization Jul 15, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction Jul 15, 2025 Change Point Detection Reinforcement Learning (RL)
— Unverified 0Exploring the robustness of TractOracle methods in RL-based tractography Jul 15, 2025 Diffusion MRI reinforcement-learning
Code Code Available 0Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Jul 14, 2025 Math Mathematical Reasoning
Code Code Available 1Deep Reinforcement Learning with Gradient Eligibility Traces Jul 12, 2025 Deep Reinforcement Learning MuJoCo
Code Code Available 1A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning Jul 11, 2025 Math Mathematical Reasoning
Code Code Available 1Scaling RL to Long Videos Jul 10, 2025 Reinforcement Learning (RL) Spatial Reasoning
Code Code Available 0The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs Jul 10, 2025 Multimodal Reasoning Reinforcement Learning (RL)
— Unverified 0Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model Jul 9, 2025 Language Modeling Language Modelling
— Unverified 0Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning Jul 9, 2025 Reinforcement Learning (RL)
— Unverified 0AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Jul 8, 2025 GPU reinforcement-learning
Code Code Available 2High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Jul 8, 2025 MME Reinforcement Learning (RL)
Code Code Available 2CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation Jul 8, 2025 Reinforcement Learning (RL) TAG
— Unverified 0FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models Jul 8, 2025 Logical Reasoning Reinforcement Learning (RL)
— Unverified 0GTA1: GUI Test-time Scaling Agent Jul 8, 2025 Reinforcement Learning (RL) Task Planning
Code Code Available 2Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation Jul 8, 2025 MuJoCo Out-of-Distribution Detection
— Unverified 0Detecting and Mitigating Reward Hacking in Reinforcement Learning Systems: A Comprehensive Empirical Study Jul 8, 2025 MuJoCo Recommendation Systems
— Unverified 0Robust Bandwidth Estimation for Real-Time Communication with Offline Reinforcement Learning Jul 8, 2025 Offline RL Reinforcement Learning (RL)
— Unverified 0Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Jul 7, 2025 Reinforcement Learning (RL) Visual Reasoning
— Unverified 02048: Reinforcement Learning in a Delayed Reward Environment Jul 7, 2025 quantile regression reinforcement-learning
— Unverified 0Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains Jul 2, 2025 Atari Games Chatbot
Code Code Available 0Kwai Keye-VL Technical Report Jul 2, 2025 Instruction Following Reinforcement Learning (RL)
Code Code Available 4RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism Jun 30, 2025 Question Answering RAG
Code Code Available 5Constructing Non-Markovian Decision Process via History Aggregator Jun 30, 2025 Decision Making Reinforcement Learning (RL)
Code Code Available 0Listener-Rewarded Thinking in VLMs for Image Preferences Jun 28, 2025 Memorization Reinforcement Learning (RL)
— Unverified 0A Survey of Continual Reinforcement Learning Jun 27, 2025 Continual Learning Decision Making
— Unverified 0Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review Jun 27, 2025 Continual Learning Diversity
— Unverified 0Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning Jun 27, 2025 Foreground Segmentation object-detection
Code Code Available 2APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization Jun 26, 2025 Multimodal Reasoning Reinforcement Learning (RL)
Code Code Available 0Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning Jun 26, 2025 Decision Making Hierarchical Reinforcement Learning
— Unverified 0Optimising 4th-Order Runge-Kutta Methods: A Dynamic Heuristic Approach for Efficiency and Low Storage Jun 26, 2025 AutoML Computational Efficiency
— Unverified 0Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments Jun 26, 2025 Reinforcement Learning (RL) Thompson Sampling
— Unverified 0Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks Jun 26, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Jun 26, 2025 Large Language Model Multimodal Reasoning
Code Code Available 2Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games Jun 26, 2025 Reinforcement Learning (RL)
Code Code Available 0RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment Jun 26, 2025 Reinforcement Learning (RL)
— Unverified 0Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning Jun 26, 2025 Action Generation Decision Making
— Unverified 0