Learn to Reason Efficiently with Adaptive Length-based Reward Shaping May 21, 2025 Reinforcement Learning (RL)
Code Code Available 2A Temporal Difference Method for Stochastic Continuous Dynamics May 21, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems May 21, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0MMaDA: Multimodal Large Diffusion Language Models May 21, 2025 Image Generation Reinforcement Learning (RL)
Code Code Available 0Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives May 21, 2025 Reinforcement Learning (RL)
— Unverified 0STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs May 21, 2025 Efficient Exploration Reinforcement Learning (RL)
Code Code Available 0Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One May 21, 2025 Model Selection Reinforcement Learning (RL)
— Unverified 0VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL May 21, 2025 Reinforcement Learning (RL)
— Unverified 0LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models May 21, 2025 MuJoCo Reinforcement Learning (RL)
— Unverified 0Learning-based Autonomous Oversteer Control and Collision Avoidance May 21, 2025 Autonomous Driving Collision Avoidance
— Unverified 0ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning May 21, 2025 Pseudo Label Reinforcement Learning (RL)
— Unverified 0HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving May 21, 2025 Autonomous Driving Hallucination
— Unverified 0Guided Policy Optimization under Partial Observability May 21, 2025 continuous-control Continuous Control
Code Code Available 0Bellman operator convergence enhancements in reinforcement learning algorithms May 20, 2025 Acrobot Decision Making
— Unverified 0RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning May 20, 2025 Math Reinforcement Learning (RL)
— Unverified 0Self-Evolving Curriculum for LLM Reasoning May 20, 2025 Code Generation Policy Gradient Methods
— Unverified 0KIPPO: Koopman-Inspired Proximal Policy Optimization May 20, 2025 Computational Efficiency continuous-control
— Unverified 0Normalized Cut with Reinforcement Learning in Constrained Action Space May 20, 2025 Combinatorial Optimization reinforcement-learning
— Unverified 0Think-J: Learning to Think for Generative LLM-as-a-Judge May 20, 2025 Offline RL Reinforcement Learning (RL)
Code Code Available 0Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning May 20, 2025 MMLU Reinforcement Learning (RL)
— Unverified 0AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum May 20, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0s3: You Don't Need That Much Data to Train a Search Agent via RL May 20, 2025 RAG Reinforcement Learning (RL)
Code Code Available 4UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning May 20, 2025 Large Language Model Multimodal Large Language Model
— Unverified 0Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models May 20, 2025 Medical Visual Question Answering Question Answering
— Unverified 0TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning May 20, 2025 Math Reinforcement Learning (RL)
Code Code Available 1General-Reasoner: Advancing LLM Reasoning Across All Domains May 20, 2025 All Math
Code Code Available 3APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight May 20, 2025 Causal Inference Decision Making
Code Code Available 0NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation May 20, 2025 Autonomous Navigation Benchmarking
— Unverified 0Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks May 20, 2025 Decision Making Kolmogorov-Arnold Networks
— Unverified 0Benchmarking MOEAs for solving continuous multi-objective RL problems May 19, 2025 Benchmarking Evolutionary Algorithms
Code Code Available 0Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning May 19, 2025 D4RL model
— Unverified 0Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis May 19, 2025 All Multi-Armed Bandits
— Unverified 0Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability May 19, 2025 RAG Reinforcement Learning (RL)
Code Code Available 1ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving May 19, 2025 Reinforcement Learning (RL)
— Unverified 0On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding May 19, 2025 Code Generation Code Translation
— Unverified 0Counterfactual Explanations for Continuous Action Reinforcement Learning May 19, 2025 counterfactual reinforcement-learning
Code Code Available 0Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs May 19, 2025 Reinforcement Learning (RL)
Code Code Available 1G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning May 19, 2025 Language Modeling Language Modelling
Code Code Available 2Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization May 19, 2025 Offline RL Portfolio Optimization
— Unverified 0DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management May 19, 2025 Management Reinforcement Learning (RL)
— Unverified 0J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization May 19, 2025 Reinforcement Learning (RL)
— Unverified 0Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs May 19, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0Optimizing Anytime Reasoning via Budget Relative Policy Optimization May 19, 2025 Mathematical Reasoning Reinforcement Learning (RL)
Code Code Available 2Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning May 19, 2025 D4RL Model-based Reinforcement Learning
— Unverified 0Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach May 19, 2025 Fairness Reinforcement Learning (RL)
— Unverified 0Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning May 19, 2025 Reinforcement Learning (RL)
— Unverified 0ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning May 19, 2025 Machine Translation reinforcement-learning
Code Code Available 3Synthetic Data RL: Task Definition Is All You Need May 18, 2025 All GSM8K
Code Code Available 2Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents May 18, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Resolving Latency and Inventory Risk in Market Making with Reinforcement Learning May 18, 2025 Reinforcement Learning (RL)
— Unverified 0