IRanker: Towards Ranking Foundation Model Jun 25, 2025 GSM8K model
Code Code Available 1Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance Jun 25, 2025 Reinforcement Learning (RL)
Code Code Available 0Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control Jun 25, 2025 Bayesian Optimization Reinforcement Learning (RL)
Code Code Available 0DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Jun 25, 2025 Code Generation Denoising
Code Code Available 4Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards Jun 25, 2025 Reinforcement Learning (RL)
— Unverified 0OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Jun 25, 2025 Language Modeling Language Modelling
Code Code Available 2Partially Observable Residual Reinforcement Learning for PV-Inverter-Based Voltage Control in Distribution Grids Jun 24, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Jun 24, 2025 Hallucination Hallucination Evaluation
Code Code Available 1A Comparative Analysis of Reinforcement Learning and Conventional Deep Learning Approaches for Bearing Fault Diagnosis Jun 24, 2025 Diagnostic Fault Diagnosis
— Unverified 0Causal-Aware Intelligent QoE Optimization for VR Interaction with Adaptive Keyframe Extraction Jun 24, 2025 Causal Inference CPU
— Unverified 0Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion Jun 24, 2025 Hierarchical Reinforcement Learning reinforcement-learning
— Unverified 0Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots Jun 23, 2025 Memorization Reinforcement Learning (RL)
— Unverified 0AdapThink: Adaptive Thinking Preferences for Reasoning Language Model Jun 23, 2025 Diversity Language Modeling
— Unverified 0LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Jun 23, 2025 Reinforcement Learning (RL) Text Generation
Code Code Available 5Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning Jun 23, 2025 GPU Large Language Model
Code Code Available 2Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities Jun 22, 2025 Reinforcement Learning (RL)
Code Code Available 2Accelerating Residual Reinforcement Learning with Uncertainty Estimation Jun 21, 2025 D4RL reinforcement-learning
— Unverified 0Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking Jun 21, 2025 Benchmarking Reinforcement Learning (RL)
— Unverified 0Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation Jun 20, 2025 Reinforcement Learning (RL)
Code Code Available 0Learning Dexterous Object Handover Jun 20, 2025 Object Reinforcement Learning (RL)
— Unverified 0Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity Jun 20, 2025 continuous-control Continuous Control
Code Code Available 0Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations Jun 19, 2025 Reinforcement Learning (RL)
— Unverified 0Multi-Task Lifelong Reinforcement Learning for Wireless Sensor Networks Jun 19, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0From General to Targeted Rewards: Surpassing GPT-4 in Open-Ended Long-Context Generation Jun 19, 2025 Dataset Generation Reinforcement Learning (RL)
— Unverified 0VRAIL: Vectorized Reward-based Attribution for Interpretable Learning Jun 19, 2025 Reinforcement Learning (RL)
— Unverified 0Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access Jun 18, 2025 Q-Learning reinforcement-learning
— Unverified 0Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks Jun 18, 2025 Decision Making Language Modeling
— Unverified 0Steering Your Diffusion Policy with Latent Space Reinforcement Learning Jun 18, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study Jun 18, 2025 Earth Observation Management
— Unverified 0PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning Jun 17, 2025 General Reinforcement Learning Multimodal Reasoning
— Unverified 0HiLight: A Hierarchical Reinforcement Learning Framework with Global Adversarial Guidance for Large-Scale Traffic Signal Control Jun 17, 2025 Hierarchical Reinforcement Learning reinforcement-learning
— Unverified 0Reasoning with Exploration: An Entropy Perspective Jun 17, 2025 Reinforcement Learning (RL)
— Unverified 0Unsupervised Skill Discovery through Skill Regions Differentiation Jun 17, 2025 Density Estimation Reinforcement Learning (RL)
— Unverified 0IntelliLung: Advancing Safe Mechanical Ventilation using Offline RL with Hybrid Actions and Clinically Aligned Rewards Jun 17, 2025 Offline RL Reinforcement Learning (RL)
— Unverified 0Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Jun 17, 2025 Data Integration Large Language Model
— Unverified 0Zeroth-Order Optimization is Secretly Single-Step Policy Optimization Jun 17, 2025 Reinforcement Learning (RL)
— Unverified 0Adaptive Reinforcement Learning for Unobservable Random Delays Jun 17, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Jun 16, 2025 Multimodal Reasoning Reinforcement Learning (RL)
Code Code Available 1AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Jun 16, 2025 Math Reinforcement Learning (RL)
— Unverified 0MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Jun 16, 2025 Mixture-of-Experts Reinforcement Learning (RL)
Code Code Available 7RL-Guided MPC for Autonomous Greenhouse Control Jun 16, 2025 Model Predictive Control Reinforcement Learning (RL)
— Unverified 0The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning Jun 16, 2025 Deep Reinforcement Learning MuJoCo
— Unverified 0TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Jun 16, 2025 Reinforcement Learning (RL) Time Series
Code Code Available 2Value-Free Policy Optimization via Reward Partitioning Jun 16, 2025 Language Modeling Language Modelling
Code Code Available 0Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Jun 16, 2025 Reinforcement Learning (RL)
— Unverified 0A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints Jun 16, 2025 Job Shop Scheduling Reinforcement Learning (RL)
Code Code Available 1StaQ it! Growing neural networks for Policy Mirror Descent Jun 16, 2025 Reinforcement Learning (RL)
— Unverified 0ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture Jun 16, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processes Jun 16, 2025 Reinforcement Learning (RL)
— Unverified 0Overcoming Overfitting in Reinforcement Learning via Gaussian Process Diffusion Policy Jun 16, 2025 GPR Reinforcement Learning (RL)
Code Code Available 0