LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Mar 10, 2025 Logical Reasoning Multimodal Reasoning
Code Code Available 4Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Mar 10, 2025 Math Meta Reinforcement Learning
— Unverified 0Probabilistic Shielding for Safe Reinforcement Learning Mar 9, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Agent models: Internalizing Chain-of-Action Generation into Reasoning models Mar 9, 2025 Action Generation Reinforcement Learning (RL)
Code Code Available 2Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models Mar 9, 2025 Anomaly Detection Mamba
Code Code Available 0Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Mar 9, 2025 Math Multimodal Reasoning
Code Code Available 5UAV-Assisted Coverage Hole Detection Using Reinforcement Learning in Urban Cellular Networks Mar 9, 2025 Reinforcement Learning (RL)
— Unverified 0A Novel Multi-Objective Reinforcement Learning Algorithm for Pursuit-Evasion Game Mar 9, 2025 Multi-Objective Reinforcement Learning Q-Learning
— Unverified 0GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks Mar 9, 2025 Card Games Diversity
— Unverified 0Automated Proof of Polynomial Inequalities via Reinforcement Learning Mar 9, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Dynamic Load Balancing for EV Charging Stations Using Reinforcement Learning and Demand Prediction Mar 9, 2025 Graph Neural Network Reinforcement Learning (RL)
— Unverified 0ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning Mar 8, 2025 Bayesian Optimization Deep Reinforcement Learning
— Unverified 0Synergizing AI and Digital Twins for Next-Generation Network Optimization, Forecasting, and Security Mar 8, 2025 Federated Learning Reinforcement Learning (RL)
— Unverified 0Vairiational Stochastic Games Mar 8, 2025 Reinforcement Learning (RL) Variational Inference
— Unverified 0Policy Constraint by Only Support Constraint for Offline Reinforcement Learning Mar 7, 2025 Offline RL reinforcement-learning
Code Code Available 0R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Mar 7, 2025 RAG Reinforcement Learning (RL)
Code Code Available 4Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation Mar 7, 2025 Deep Reinforcement Learning Out-of-Distribution Detection
— Unverified 0Generative Multi-Agent Q-Learning for Policy Optimization: Decentralized Wireless Networks Mar 7, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0Tractable Representations for Convergent Approximation of Distributional HJB Equations Mar 7, 2025 Reinforcement Learning (RL)
— Unverified 0Multi-Fidelity Policy Gradient Algorithms Mar 7, 2025 Reinforcement Learning (RL)
— Unverified 0Multi-Robot Collaboration through Reinforcement Learning and Abstract Simulation Mar 7, 2025 Multi-agent Reinforcement Learning reinforcement-learning
— Unverified 0Can We Optimize Deep RL Policy Weights as Trajectory Modeling? Mar 6, 2025 Deep Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0Energy-Weighted Flow Matching for Offline Reinforcement Learning Mar 6, 2025 Offline RL reinforcement-learning
— Unverified 0Lessons learned from field demonstrations of model predictive control and reinforcement learning for residential and commercial HVAC: A review Mar 6, 2025 Model Predictive Control Reinforcement Learning (RL)
Code Code Available 0Provably Correct Automata Embeddings for Optimal Automata-Conditioned Reinforcement Learning Mar 6, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models Mar 6, 2025 Motion Planning reinforcement-learning
— Unverified 0Data-Efficient Learning from Human Interventions for Mobile Robots Mar 6, 2025 Imitation Learning Reinforcement Learning (RL)
— Unverified 0Rebalanced Multimodal Learning with Data-aware Unimodal Sampling Mar 5, 2025 Reinforcement Learning (RL)
— Unverified 0DreamerV3 for Traffic Signal Control: Hyperparameter Tuning and Performance Mar 4, 2025 Reinforcement Learning (RL) Traffic Signal Control
— Unverified 0Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models Mar 4, 2025 Reinforcement Learning (RL)
— Unverified 0Quantitative Resilience Modeling for Autonomous Cyber Defense Mar 4, 2025 Reinforcement Learning (RL)
— Unverified 0Accelerating Multi-Task Temporal Difference Learning under Low-Rank Representation Mar 3, 2025 Reinforcement Learning (RL)
— Unverified 0What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret Mar 3, 2025 Math Reinforcement Learning (RL)
— Unverified 0All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning Mar 3, 2025 All Reinforcement Learning (RL)
— Unverified 0Active Alignments of Lens Systems with Reinforcement Learning Mar 3, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Mar 3, 2025 Reinforcement Learning (RL)
Code Code Available 3Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning Mar 3, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning Mar 3, 2025 Reinforcement Learning (RL)
Code Code Available 2Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models Mar 2, 2025 Reinforcement Learning (RL)
— Unverified 0Minimax Optimal Reinforcement Learning with Quasi-Optimism Mar 2, 2025 Computational Efficiency reinforcement-learning
— Unverified 0Reinforcement learning with combinatorial actions for coupled restless bandits Mar 1, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 1Towards Understanding the Benefit of Multitask Representation Learning in Decision Process Mar 1, 2025 Multi-Armed Bandits Reinforcement Learning (RL)
— Unverified 0Scalable Reinforcement Learning for Virtual Machine Scheduling Mar 1, 2025 Cloud Computing reinforcement-learning
— Unverified 0Discrete Codebook World Models for Continuous Control Mar 1, 2025 continuous-control Continuous Control
Code Code Available 1Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions Mar 1, 2025 Language Modeling Language Modelling
— Unverified 0What Makes a Good Diffusion Planner for Decision Making? Mar 1, 2025 Action Generation Decision Making
Code Code Available 2Adaptive Reinforcement Learning for State Avoidance in Discrete Event Systems Feb 28, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning Feb 28, 2025 Information Retrieval reinforcement-learning
Code Code Available 4Subtask-Aware Visual Reward Learning from Segmented Demonstrations Feb 28, 2025 Contrastive Learning Reinforcement Learning (RL)
— Unverified 0Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments Feb 28, 2025 Deep Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0