MedDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support May 26, 2025 Imputation Model-based Reinforcement Learning
— Unverified 0Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback May 26, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL May 26, 2025 D4RL Offline RL
Code Code Available 0Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning May 26, 2025 Reinforcement Learning (RL)
— Unverified 0Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network May 26, 2025 Evolutionary Algorithms MuJoCo
— Unverified 0Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model May 26, 2025 Diagnostic Reinforcement Learning (RL)
Code Code Available 0What Can RL Bring to VLA Generalization? An Empirical Study May 26, 2025 Reinforcement Learning (RL) Vision-Language-Action
— Unverified 0Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition May 26, 2025 Math Reinforcement Learning (RL)
— Unverified 0MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning May 26, 2025 document understanding Machine Translation
— Unverified 0SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond May 26, 2025 Logical Reasoning Reinforcement Learning (RL)
Code Code Available 2Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration May 26, 2025 Domain Generalization Hallucination
Code Code Available 2Incentivizing Reasoning from Weak Supervision May 26, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue May 26, 2025 Diagnostic Question Answering
Code Code Available 2Interleaved Reasoning for Large Language Models via Reinforcement Learning May 26, 2025 Logical Reasoning Math
— Unverified 0TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning May 26, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning May 26, 2025 Large Language Model Reinforcement Learning (RL)
— Unverified 0DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning May 26, 2025 Efficient Exploration reinforcement-learning
Code Code Available 0Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning May 26, 2025 Denoising reinforcement-learning
Code Code Available 0MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability May 26, 2025 Multi-hop Question Answering Question Answering
Code Code Available 2A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning May 25, 2025 Reinforcement Learning (RL)
Code Code Available 0SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data May 25, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 1Structured Reinforcement Learning for Combinatorial Decision-Making May 25, 2025 Combinatorial Optimization Decision Making
Code Code Available 1Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning May 25, 2025 Deep Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0Semi-pessimistic Reinforcement Learning May 25, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Reinforced Latent Reasoning for LLM-based Recommendation May 25, 2025 Recommendation Systems Reinforcement Learning (RL)
— Unverified 0VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization May 25, 2025 Reinforcement Learning (RL)
Code Code Available 0TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis May 25, 2025 CPU GPU
— Unverified 0The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training May 25, 2025 Reinforcement Learning (RL) Token Reduction
— Unverified 0Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning May 25, 2025 Denoising Reinforcement Learning (RL)
Code Code Available 1SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards May 25, 2025 Image Captioning Multimodal Reasoning
Code Code Available 1FedORA: Resource Allocation for Federated Learning in ORAN using Radio Intelligent Controllers May 25, 2025 Federated Learning Reinforcement Learning (RL)
— Unverified 0G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning May 24, 2025 Link Prediction Node Classification
— Unverified 0GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning May 24, 2025 GPU Offline RL
— Unverified 0Steering LLM Reasoning Through Bias-Only Adaptation May 24, 2025 GSM8K Math
— Unverified 0On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization May 24, 2025 Math Reinforcement Learning (RL)
— Unverified 0Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs May 24, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 1AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting May 24, 2025 GSM8K Reinforcement Learning (RL)
Code Code Available 0Hybrid Latent Reasoning via Reinforcement Learning May 24, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models May 24, 2025 Reinforcement Learning (RL)
Code Code Available 0VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning May 24, 2025 GPU Reinforcement Learning (RL)
Code Code Available 3Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning May 24, 2025 Reinforcement Learning (RL)
— Unverified 0One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion May 24, 2025 Humanoid Control Motion Synthesis
— Unverified 0Reinforcement Speculative Decoding for Fast Ranking May 23, 2025 Information Retrieval Recommendation Systems
— Unverified 0WiNGPT-3.0 Technical Report May 23, 2025 Diagnostic MedQA
Code Code Available 0One RL to See Them All: Visual Triple Unified Reinforcement Learning May 23, 2025 All Math
— Unverified 0QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning May 23, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 4Diffusion Self-Weighted Guidance for Offline Reinforcement Learning May 23, 2025 Offline RL reinforcement-learning
— Unverified 0Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey May 23, 2025 Active Learning Reinforcement Learning (RL)
— Unverified 0Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning May 23, 2025 Math Reinforcement Learning (RL)
Code Code Available 1Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards May 23, 2025 Reinforcement Learning (RL)
Code Code Available 0