OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning May 13, 2025 Reinforcement Learning (RL) Visual Reasoning
Code Code Available 3Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning May 13, 2025 Meta-Learning Reinforcement Learning (RL)
— Unverified 0Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles May 13, 2025 Autonomous Vehicles GPU
— Unverified 0The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning May 12, 2025 Reinforcement Learning (RL)
— Unverified 0DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward May 12, 2025 Recommendation Systems Reinforcement Learning (RL)
Code Code Available 0Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review May 12, 2025 Active Learning Bayesian Inference
— Unverified 0Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning May 12, 2025 Language Modeling Language Modelling
Code Code Available 1Measuring General Intelligence with Generated Games May 12, 2025 In-Context Learning Large Language Model
Code Code Available 1DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation May 12, 2025 Language Modeling Language Modelling
Code Code Available 2Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving May 12, 2025 Math Mathematical Problem-Solving
Code Code Available 2INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning May 12, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning May 12, 2025 Image Generation Reinforcement Learning (RL)
— Unverified 0DanceGRPO: Unleashing GRPO on Visual Generation May 12, 2025 Denoising reinforcement-learning
Code Code Available 5Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains May 12, 2025 continuous-control Continuous Control
— Unverified 0Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent May 12, 2025 RAG Reinforcement Learning (RL)
Code Code Available 2Design and Experimental Test of Datatic Approximate Optimal Filter in Nonlinear Dynamic Systems May 11, 2025 Computational Efficiency Reinforcement Learning (RL)
— Unverified 0FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots May 11, 2025 Reinforcement Learning (RL)
— Unverified 0Learning Value of Information towards Joint Communication and Control in 6G V2X May 11, 2025 Autonomous Vehicles Decision Making
— Unverified 0Reinforcement Learning (RL) Meets Urban Climate Modeling: Investigating the Efficacy and Impacts of RL-Based HVAC Control May 11, 2025 Reinforcement Learning (RL)
— Unverified 0X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real May 11, 2025 Domain Adaptation Imitation Learning
— Unverified 0LineFlow: A Framework to Learn Active Control of Production Lines May 10, 2025 Reinforcement Learning (RL)
Code Code Available 0REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback May 10, 2025 Reinforcement Learning (RL)
— Unverified 0Balancing Progress and Safety: A Novel Risk-Aware Objective for RL in Autonomous Driving May 10, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach May 10, 2025 Autonomous Driving Offline RL
— Unverified 0Interaction-Aware Parameter Privacy-Preserving Data Sharing in Coupled Systems via Particle Filter Reinforcement Learning May 9, 2025 Decision Making Privacy Preserving
— Unverified 0Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients May 9, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning May 9, 2025 D4RL Offline RL
— Unverified 0Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach May 9, 2025 Decision Making Pose Estimation
— Unverified 0Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs May 8, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0On Corruption-Robustness in Performative Reinforcement Learning May 8, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles May 8, 2025 Computational Efficiency Reinforcement Learning (RL)
— Unverified 0Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach May 8, 2025 D4RL Decision Making
— Unverified 0USPR: Learning a Unified Solver for Profiled Routing May 8, 2025 Computational Efficiency Decoder
Code Code Available 0Flow-GRPO: Training Flow Matching Models via Online RL May 8, 2025 Denoising Diversity
Code Code Available 7Enhancing Reinforcement Learning for the Floorplanning of Analog ICs with Beam Search May 8, 2025 Reinforcement Learning (RL)
— Unverified 0Multi-agent Embodied AI: Advances and Future Directions May 8, 2025 Navigate Reinforcement Learning (RL)
— Unverified 0Large Language Models are Autonomous Cyber Defenders May 7, 2025 Reinforcement Learning (RL)
Code Code Available 0Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers May 7, 2025 Math Reinforcement Learning (RL)
— Unverified 0ZeroSearch: Incentivize the Search Capability of LLMs without Searching May 7, 2025 Reinforcement Learning (RL) Retrieval
Code Code Available 5Extending a Quantum Reinforcement Learning Exploration Policy with Flags to Connect Four May 7, 2025 Reinforcement Learning (RL)
— Unverified 0Fight Fire with Fire: Defending Against Malicious RL Fine-Tuning via Reward Neutralization May 7, 2025 Reinforcement Learning (RL)
— Unverified 0Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions May 7, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems May 6, 2025 Reinforcement Learning (RL) Scheduling
— Unverified 0Deep Q-Network (DQN) multi-agent reinforcement learning (MARL) for Stock Trading May 6, 2025 Multi-agent Reinforcement Learning Reinforcement Learning (RL)
— Unverified 0VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making May 6, 2025 Decision Making General Knowledge
— Unverified 0AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control May 6, 2025 Imitation Learning Reinforcement Learning (RL)
— Unverified 0Actor-Critics Can Achieve Optimal Sample Efficiency May 6, 2025 Reinforcement Learning (RL)
— Unverified 0The Steganographic Potentials of Language Models May 6, 2025 Reinforcement Learning (RL)
— Unverified 0Online Phase Estimation of Human Oscillatory Motions using Deep Learning May 5, 2025 Deep Learning Position
— Unverified 0R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning May 5, 2025 Reinforcement Learning (RL)
Code Code Available 3