Eliciting Reasoning in Language Models with Cognitive Tools Jun 13, 2025 Mathematical Reasoning Reinforcement Learning (RL)
— Unverified 0PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier Jun 12, 2025 Reinforcement Learning (RL)
— Unverified 0Magistral Jun 12, 2025 Instruction Following Reinforcement Learning (RL)
— Unverified 0Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc Teamwork Jun 12, 2025 Reinforcement Learning (RL)
Code Code Available 0Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization Jun 12, 2025 Reinforcement Learning (RL)
Code Code Available 0Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Jun 11, 2025 Anatomy Reinforcement Learning (RL)
— Unverified 0A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications Jun 11, 2025 Autonomous Vehicles Federated Learning
— Unverified 0Attention on flow control: transformer-based reinforcement learning for lift regulation in highly disturbed flows Jun 11, 2025 Attribute Pitch control
— Unverified 0Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error Jun 11, 2025 Reinforcement Learning (RL)
— Unverified 0Optimal Operating Strategy for PV-BESS Households: Balancing Self-Consumption and Self-Sufficiency Jun 10, 2025 Model Predictive Control Reinforcement Learning (RL)
— Unverified 0Policy-Based Trajectory Clustering in Offline Reinforcement Learning Jun 10, 2025 Clustering D4RL
— Unverified 0Exploration by Random Reward Perturbation Jun 10, 2025 Diversity Reinforcement Learning (RL)
— Unverified 0Reinforcement Learning Teachers of Test Time Scaling Jun 10, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization Jun 10, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL) Jun 10, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning Jun 10, 2025 All graph construction
— Unverified 0How to Provably Improve Return Conditioned Supervised Learning? Jun 10, 2025 Decision Making Offline RL
— Unverified 0Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood Jun 10, 2025 Computational Efficiency D4RL
Code Code Available 0DeepForm: Reasoning Large Language Model for Communication System Formulation Jun 10, 2025 Language Modeling Language Modelling
— Unverified 0Through the Valley: Path to Effective Long CoT Training for Small Language Models Jun 9, 2025 8k Reinforcement Learning (RL)
— Unverified 0DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Jun 9, 2025 Data Augmentation Large Language Model
— Unverified 0AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking Jun 9, 2025 Reinforcement Learning (RL)
— Unverified 0Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information Jun 9, 2025 Multi-agent Reinforcement Learning reinforcement-learning
— Unverified 0Reinforcement Pre-Training Jun 9, 2025 Language Modeling Language Modelling
— Unverified 0Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning Jun 9, 2025 Reinforcement Learning (RL)
— Unverified 0LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement Jun 9, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine Jun 8, 2025 Decision Making Quantization
— Unverified 0Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning Jun 8, 2025 Reinforcement Learning (RL)
— Unverified 0CARoL: Context-aware Adaptation for Robot Learning Jun 8, 2025 Reinforcement Learning (RL)
— Unverified 0Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning Jun 8, 2025 Offline RL Question Answering
— Unverified 0Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression Jun 8, 2025 quantile regression Reinforcement Learning (RL)
— Unverified 0On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH) Jun 8, 2025 parameter estimation Reinforcement Learning (RL)
— Unverified 0Prompting Wireless Networks: Reinforced In-Context Learning for Power Control Jun 6, 2025 Decision Making In-Context Learning
— Unverified 0CodeContests+: High-Quality Test Case Generation for Competitive Programming Jun 6, 2025 Reinforcement Learning (RL)
— Unverified 0Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control Jun 6, 2025 Reinforcement Learning (RL) Sleep Quality
— Unverified 0Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning Jun 6, 2025 Reinforcement Learning (RL)
Code Code Available 0On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models Jun 5, 2025 Instruction Following Reinforcement Learning (RL)
— Unverified 0Safe Planning and Policy Optimization via World Model Learning Jun 5, 2025 continuous-control Continuous Control
— Unverified 0Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Jun 5, 2025 All Math
— Unverified 0Dissecting Long Reasoning Models: An Empirical Study Jun 5, 2025 Reinforcement Learning (RL)
Code Code Available 0Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning Jun 5, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning Jun 5, 2025 Mathematical Reasoning Problem Decomposition
— Unverified 0Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond Jun 4, 2025 Arithmetic Reasoning Reinforcement Learning (RL)
— Unverified 0A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability Jun 4, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design Jun 4, 2025 Reinforcement Learning (RL)
— Unverified 0SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL Jun 4, 2025 Disentanglement Industrial Robots
— Unverified 0Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Jun 4, 2025 Multimodal Reasoning Reinforcement Learning (RL)
— Unverified 0Latent Guided Sampling for Combinatorial Optimization Jun 4, 2025 Combinatorial Optimization Drug Discovery
Code Code Available 0Joint Modeling for Learning Decision-Making Dynamics in Behavioral Experiments Jun 3, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games Jun 3, 2025 Continual Learning Reinforcement Learning (RL)
— Unverified 0