SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 16011625 of 15113 papers

TitleStatusHype
DMC-VB: A Benchmark for Representation Learning for Control with Visual DistractorsCode1
NEORL: NeuroEvolution Optimization with Reinforcement LearningCode1
DNA: Proximal Policy Optimization with a Dual Network ArchitectureCode1
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence SummarizationCode1
Neural Inventory Control in Networks via Hindsight Differentiable Policy OptimizationCode1
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World ModellingCode1
Battlesnake Challenge: A Multi-agent Reinforcement Learning Playground with Human-in-the-loopCode1
Does Zero-Shot Reinforcement Learning Exist?Code1
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMsCode1
NeuralSympCheck: A Symptom Checking and Disease Diagnostic Neural Model with Logic RegularizationCode1
Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement LearningCode1
An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint ProgrammingCode1
An End-to-end Deep Reinforcement Learning Approach for the Long-term Short-term Planning on the Frenet SpaceCode1
Neurosymbolic Reinforcement Learning with Formally Verified ExplorationCode1
Batch Exploration with Examples for Scalable Robotic Reinforcement LearningCode1
Building a 3-Player Mahjong AI using Deep Reinforcement LearningCode1
Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative ExplorationCode1
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPOCode1
Generating π-Functional Molecules Using STGG+ with Active LearningCode1
Doubly Mild Generalization for Offline Reinforcement LearningCode1
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical ReasoningCode1
DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing ProblemsCode1
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter EfficientCode1
Drafting in Collectible Card Games via Reinforcement LearningCode1
Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic EnvironmentsCode1
Show:102550
← PrevPage 65 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified