SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 25512575 of 15113 papers

TitleStatusHype
Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning0
DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management0
Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning0
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach0
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis0
Counterfactual Explanations for Continuous Action Reinforcement LearningCode0
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs0
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving0
Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization0
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning0
Benchmarking MOEAs for solving continuous multi-objective RL problemsCode0
UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning0
AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion0
Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios0
A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning0
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement LearningCode0
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents0
Resolving Latency and Inventory Risk in Market Making with Reinforcement Learning0
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling0
Online Iterative Self-Alignment for Radiology Report Generation0
Retrospex: Language Agent Meets Offline Reinforcement Learning CriticCode0
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning0
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning0
J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge0
An agentic system with reinforcement-learned subsystem improvements for parsing form-like documentsCode0
Show:102550
← PrevPage 103 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified