Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Mar 25, 2025 Math Reinforcement Learning (RL)
— Unverified 0NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios Mar 25, 2025 Benchmarking Offline RL
Code Code Available 1AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models Mar 24, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning Mar 24, 2025 Decision Making Hierarchical Reinforcement Learning
— Unverified 0Evolutionary Policy Optimization Mar 24, 2025 Diversity Evolutionary Algorithms
— Unverified 0Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling Mar 24, 2025 Benchmarking OpenAI Gym
Code Code Available 0Continual Reinforcement Learning for HVAC Systems Control: Integrating Hypernetworks and Transfer Learning Mar 24, 2025 Continual Learning Deep Reinforcement Learning
Code Code Available 0RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation Mar 24, 2025 Reinforcement Learning (RL)
— Unverified 0Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning Mar 24, 2025 Language Modeling Language Modelling
— Unverified 0SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Mar 24, 2025 Instruction Following Math
Code Code Available 7MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse Mar 24, 2025 Layout Generation Reinforcement Learning (RL)
Code Code Available 3Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training Mar 24, 2025 Diversity Large Language Model
Code Code Available 1Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation Mar 24, 2025 Continual Learning Diversity
— Unverified 0Sample-Efficient Reinforcement Learning of Koopman eNMPC Mar 24, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization Mar 23, 2025 Reinforcement Learning (RL) Scheduling
— Unverified 0Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization Mar 23, 2025 Reinforcement Learning (RL) Response Generation
— Unverified 0ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data Mar 23, 2025 Reinforcement Learning (RL)
— Unverified 0Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree Mar 23, 2025 Decision Making Deep Reinforcement Learning
— Unverified 0Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study Mar 23, 2025 Kolmogorov-Arnold Networks Reinforcement Learning (RL)
Code Code Available 2Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots Mar 22, 2025 Reinforcement Learning (RL)
— Unverified 0A Roadmap Towards Improving Multi-Agent Reinforcement Learning With Causal Discovery And Inference Mar 22, 2025 Causal Discovery Multi-agent Reinforcement Learning
— Unverified 0ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation Mar 22, 2025 Image Generation Reinforcement Learning (RL)
— Unverified 0Causally Aligned Curriculum Learning Mar 21, 2025 Reinforcement Learning (RL)
— Unverified 0OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Mar 21, 2025 Multimodal Reasoning Reinforcement Learning (RL)
Code Code Available 2Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem Mar 21, 2025 Collision Avoidance Management
Code Code Available 0Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent Mar 21, 2025 Large Language Model Privacy Preserving
— Unverified 0Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models Mar 20, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning Mar 20, 2025 Reinforcement Learning (RL)
— Unverified 0Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning Mar 20, 2025 Classification Few-Shot Learning
Code Code Available 2Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming Mar 20, 2025 Combinatorial Optimization reinforcement-learning
Code Code Available 0Grammar and Gameplay-aligned RL for Game Description Generation with LLMs Mar 20, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Mar 20, 2025 Mathematical Reasoning Reinforcement Learning (RL)
Code Code Available 3RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models Mar 20, 2025 Image Generation Medical Image Generation
— Unverified 0Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Mar 20, 2025 Benchmarking Reinforcement Learning (RL)
Code Code Available 4Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Mar 20, 2025 Decision Making Language Modeling
Code Code Available 4UAS Visual Navigation in Large and Unseen Environments via a Meta Agent Mar 20, 2025 Incremental Learning Meta Reinforcement Learning
— Unverified 0Comprehensive Review of Reinforcement Learning for Medical Ultrasound Imaging Mar 19, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Behaviour Discovery and Attribution for Explainable Reinforcement Learning Mar 19, 2025 Offline RL reinforcement-learning
— Unverified 0Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat Mar 19, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better Mar 19, 2025 Attribute Reinforcement Learning (RL)
— Unverified 0Empowering Medical Multi-Agents with Clinical Consultation Flow for Dynamic Diagnosis Mar 19, 2025 Decision Making Diagnostic
— Unverified 0DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning Mar 19, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0LogLLaMA: Transformer-based log anomaly detection with LLaMA Mar 19, 2025 Anomaly Detection Reinforcement Learning (RL)
— Unverified 01000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities Mar 19, 2025 Reinforcement Learning (RL) Self-Supervised Learning
— Unverified 0Reward Training Wheels: Adaptive Auxiliary Rewards for Robotics Reinforcement Learning Mar 19, 2025 Reinforcement Learning (RL)
— Unverified 0Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning Mar 19, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Reinforcement learning-based motion imitation for physiologically plausible musculoskeletal motor control Mar 18, 2025 Humanoid Control Motion Synthesis
Code Code Available 2Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Mar 18, 2025 3D Face Animation Common Sense Reasoning
Code Code Available 4CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration Mar 18, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Pauli Network Circuit Synthesis with Reinforcement Learning Mar 18, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0