Control of Renewable Energy Communities using AI and Real-World Data May 22, 2025 Data Integration energy management
— Unverified 0Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only May 22, 2025 Imitation Learning Offline RL
— Unverified 0DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation May 22, 2025 Language Modeling Language Modelling
— Unverified 0Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning May 22, 2025 Reinforcement Learning (RL)
— Unverified 0Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning May 22, 2025 Reinforcement Learning (RL)
— Unverified 0LARES: Latent Reasoning for Sequential Recommendation May 22, 2025 Recommendation Systems Reinforcement Learning (RL)
— Unverified 0Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games May 22, 2025 Reinforcement Learning (RL)
— Unverified 0Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning May 22, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0RAP: Runtime-Adaptive Pruning for LLM Inference May 22, 2025 Reinforcement Learning (RL)
— Unverified 0SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning May 22, 2025 Language Modeling Language Modelling
Code Code Available 0Strategically Linked Decisions in Long-Term Planning and Reinforcement Learning May 22, 2025 Reinforcement Learning (RL)
— Unverified 0Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies May 22, 2025 Offline RL Q-Learning
— Unverified 0Meta-reinforcement learning with minimum attention May 22, 2025 Meta-Learning Meta Reinforcement Learning
— Unverified 0Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) May 22, 2025 Autonomous Driving Bench2Drive
— Unverified 0HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving May 21, 2025 Autonomous Driving Hallucination
— Unverified 0MMaDA: Multimodal Large Diffusion Language Models May 21, 2025 Image Generation Reinforcement Learning (RL)
Code Code Available 0Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives May 21, 2025 Reinforcement Learning (RL)
— Unverified 0GRIT: Teaching MLLMs to Think with Images May 21, 2025 Reinforcement Learning (RL) Visual Reasoning
— Unverified 0Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities May 21, 2025 Math Reinforcement Learning (RL)
— Unverified 0LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models May 21, 2025 MuJoCo Reinforcement Learning (RL)
— Unverified 0VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models May 21, 2025 Benchmarking Reinforcement Learning (RL)
— Unverified 0A Temporal Difference Method for Stochastic Continuous Dynamics May 21, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One May 21, 2025 Model Selection Reinforcement Learning (RL)
— Unverified 0When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning May 21, 2025 Reinforcement Learning (RL)
— Unverified 0ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning May 21, 2025 Pseudo Label Reinforcement Learning (RL)
— Unverified 0Learning-based Autonomous Oversteer Control and Collision Avoidance May 21, 2025 Autonomous Driving Collision Avoidance
— Unverified 0Guided Policy Optimization under Partial Observability May 21, 2025 continuous-control Continuous Control
Code Code Available 0Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL May 21, 2025 4k Multimodal Reasoning
— Unverified 0Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning May 21, 2025 Language Modeling Language Modelling
— Unverified 0Reward Is Enough: LLMs Are In-Context Reinforcement Learners May 21, 2025 Large Language Model Reinforcement Learning (RL)
— Unverified 0Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems May 21, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization May 21, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 0Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning May 21, 2025 Reinforcement Learning (RL) Visual Reasoning
— Unverified 0VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL May 21, 2025 Reinforcement Learning (RL)
— Unverified 0STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs May 21, 2025 Efficient Exploration Reinforcement Learning (RL)
Code Code Available 0KIPPO: Koopman-Inspired Proximal Policy Optimization May 20, 2025 Computational Efficiency continuous-control
— Unverified 0NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation May 20, 2025 Autonomous Navigation Benchmarking
— Unverified 0AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum May 20, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0Normalized Cut with Reinforcement Learning in Constrained Action Space May 20, 2025 Combinatorial Optimization reinforcement-learning
— Unverified 0APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight May 20, 2025 Causal Inference Decision Making
Code Code Available 0Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning May 20, 2025 MMLU Reinforcement Learning (RL)
— Unverified 0Think-J: Learning to Think for Generative LLM-as-a-Judge May 20, 2025 Offline RL Reinforcement Learning (RL)
Code Code Available 0Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models May 20, 2025 Medical Visual Question Answering Question Answering
— Unverified 0Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks May 20, 2025 Decision Making Kolmogorov-Arnold Networks
— Unverified 0RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning May 20, 2025 Math Reinforcement Learning (RL)
— Unverified 0UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning May 20, 2025 Large Language Model Multimodal Large Language Model
— Unverified 0Bellman operator convergence enhancements in reinforcement learning algorithms May 20, 2025 Acrobot Decision Making
— Unverified 0Self-Evolving Curriculum for LLM Reasoning May 20, 2025 Code Generation Policy Gradient Methods
— Unverified 0Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning May 19, 2025 Reinforcement Learning (RL)
— Unverified 0ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving May 19, 2025 Reinforcement Learning (RL)
— Unverified 0