SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 71517175 of 15113 papers

TitleStatusHype
Variational Inference for Policy Gradient0
Variational Inference MPC for Bayesian Model-based Reinforcement Learning0
Variational Intrinsic Control Revisited0
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition0
Variational Meta Reinforcement Learning for Social Robotics0
Variational Model-based Policy Optimization0
Variational multiscale reinforcement learning for discovering reduced order closure models of nonlinear spatiotemporal transport systems0
Variational oracle guiding for reinforcement learning0
Variational Policy Gradient Method for Reinforcement Learning with General Utilities0
Variational quantum compiling with double Q-learning0
Parametrized quantum policies for reinforcement learning0
Policy Gradients using Variational Quantum Circuits0
Variational Quantum Reinforcement Learning via Evolutionary Optimization0
Variational Quantum Soft Actor-Critic for Robotic Arm Control0
Variational Regret Bounds for Reinforcement Learning0
Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog0
VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks0
VASE: Variational Assorted Surprise Exploration for Reinforcement Learning0
Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement Learning0
Vehicle Type Specific Waypoint Generation0
Vehicular Cooperative Perception Through Action Branching and Federated Reinforcement Learning0
Verifiable Reinforcement Learning Systems via Compositionality0
Verification of Dissipativity and Evaluation of Storage Function in Economic Nonlinear MPC using Q-Learning0
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers0
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models0
Show:102550
← PrevPage 287 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified