ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture Jun 16, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0Socratic RL: A Novel Framework for Efficient Knowledge Acquisition through Iterative Reflection and Viewpoint Distillation Jun 16, 2025 Meta-Learning reinforcement-learning
— Unverified 0CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making Jun 15, 2025 Answer Generation Decision Making
— Unverified 0Federated Neuroevolution O-RAN: Enhancing the Robustness of Deep Reinforcement Learning xApps Jun 15, 2025 Deep Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models Jun 15, 2025 Reinforcement Learning (RL)
Code Code Available 2SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models Jun 15, 2025 Logical Reasoning Reinforcement Learning (RL)
Code Code Available 5MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval Jun 14, 2025 Instruction Following Multimodal Reasoning
Code Code Available 0DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty Jun 14, 2025 continuous-control Continuous Control
Code Code Available 0Eliciting Reasoning in Language Models with Cognitive Tools Jun 13, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning Jun 13, 2025 Deep Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification Jun 13, 2025 Code Generation reinforcement-learning
— Unverified 0TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Jun 13, 2025 Math reinforcement-learning
Code Code Available 2Visual Pre-Training on Unlabeled Images using Reinforcement Learning Jun 13, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 1LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment Jun 13, 2025 GSM8K Mathematical Reasoning
— Unverified 0Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc Teamwork Jun 12, 2025 Reinforcement Learning (RL)
Code Code Available 0Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Jun 12, 2025 Reinforcement Learning (RL)
Code Code Available 0PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier Jun 12, 2025 Reinforcement Learning (RL)
— Unverified 0Magistral Jun 12, 2025 Instruction Following Reinforcement Learning (RL)
— Unverified 0RePO: Replay-Enhanced Policy Optimization Jun 11, 2025 Math Mathematical Reasoning
Code Code Available 1ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Jun 11, 2025 Code Generation Diagnostic
Code Code Available 1Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Jun 11, 2025 Anatomy Reinforcement Learning (RL)
— Unverified 0Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows Jun 11, 2025 Attribute Pitch control
— Unverified 0A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications Jun 11, 2025 Autonomous Vehicles Federated Learning
— Unverified 0Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error Jun 11, 2025 Reinforcement Learning (RL)
— Unverified 0Optimal Operating Strategy for PV-BESS Households: Balancing Self-Consumption and Self-Sufficiency Jun 10, 2025 Model Predictive Control Reinforcement Learning (RL)
— Unverified 0TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization Jun 10, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Policy-Based Trajectory Clustering in Offline Reinforcement Learning Jun 10, 2025 Clustering D4RL
— Unverified 0Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood Jun 10, 2025 Computational Efficiency D4RL
Code Code Available 0DeepForm: Reasoning Large Language Model for Communication System Formulation Jun 10, 2025 Language Modeling Language Modelling
— Unverified 0Exploration by Random Reward Perturbation Jun 10, 2025 Diversity Reinforcement Learning (RL)
— Unverified 0Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning Jun 10, 2025 Large Language Model reinforcement-learning
Code Code Available 1SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning Jun 10, 2025 Reinforcement Learning (RL)
Code Code Available 1Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning Jun 10, 2025 Model Selection Reinforcement Learning (RL)
Code Code Available 2RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Jun 10, 2025 Computational Efficiency Reinforcement Learning (RL)
Code Code Available 1Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL) Jun 10, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning Jun 10, 2025 All graph construction
— Unverified 0How to Provably Improve Return Conditioned Supervised Learning? Jun 10, 2025 Decision Making Offline RL
— Unverified 0Reinforcement Learning Teachers of Test Time Scaling Jun 10, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Intention-Conditioned Flow Occupancy Models Jun 10, 2025 Reinforcement Learning (RL)
Code Code Available 1DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Jun 9, 2025 Data Augmentation Large Language Model
— Unverified 0Play to Generalize: Learning to Reason Through Game Play Jun 9, 2025 Domain Generalization Math
Code Code Available 2Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions Jun 9, 2025 Large Language Model Reinforcement Learning (RL)
Code Code Available 2Through the Valley: Path to Effective Long CoT Training for Small Language Models Jun 9, 2025 8k Reinforcement Learning (RL)
— Unverified 0Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information Jun 9, 2025 Multi-agent Reinforcement Learning reinforcement-learning
— Unverified 0Reinforcement Pre-Training Jun 9, 2025 Language Modeling Language Modelling
— Unverified 0AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking Jun 9, 2025 Reinforcement Learning (RL)
— Unverified 0LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement Jun 9, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning Jun 9, 2025 Math Mathematical Reasoning
Code Code Available 1Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Jun 9, 2025 Reinforcement Learning (RL)
Code Code Available 2Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions Jun 9, 2025 Reinforcement Learning (RL)
Code Code Available 1