SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1370113750 of 15113 papers

TitleStatusHype
Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement LearningCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
Reinforced Extractive Summarization with Question-Focused Rewards0
Resource Allocation for a Wireless Coexistence Management System Based on Reinforcement Learning0
Robust Distant Supervision Relation Extraction via Deep Reinforcement LearningCode0
Meta-Gradient Reinforcement LearningCode0
Deep Reinforcement Learning For Sequence to Sequence ModelsCode1
A0C: Alpha Zero in Continuous Action SpaceCode0
Intelligent Trainer for Model-Based Reinforcement LearningCode0
Discovering Blind Spots in Reinforcement Learning0
Dyna Planning using a Feature Based Generative Model0
Deep Reinforcement Learning of Marked Temporal Point ProcessesCode0
Scalable Coordinated Exploration in Concurrent Reinforcement LearningCode0
Reinforcement Learning for Heterogeneous Teams with PALO Bounds0
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms0
Verifiable Reinforcement Learning via Policy ExtractionCode1
Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients0
Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied AgentsCode0
Multi-task Maximum Entropy Inverse Reinforcement LearningCode0
Where Do You Think You're Going?: Inferring Beliefs about Dynamics from BehaviorCode0
Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning0
Learning Safe Policies with Expert Guidance0
Evolution-Guided Policy Gradient in Reinforcement LearningCode0
Hierarchical Reinforcement Learning with Hindsight0
A General Family of Robust Stochastic Operators for Reinforcement Learning0
A Framework and Method for Online Inverse Reinforcement Learning0
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation0
Data-Efficient Hierarchical Reinforcement LearningCode0
Learning Real-World Robot Policies by Dreaming0
A Lyapunov-based Approach to Safe Reinforcement LearningCode0
Constrained Policy Improvement for Safe and Efficient Reinforcement LearningCode0
Unsupervised Video Object Segmentation for Deep Reinforcement LearningCode0
Machine Teaching for Inverse Reinforcement Learning: Algorithms and ApplicationsCode0
Learning to Teach in Cooperative Multiagent Reinforcement Learning0
Episodic Memory Deep Q-Networks0
Reinforcement Learning of Theorem Proving0
Hierarchical Reinforcement Learning with Deep Nested Agents0
Improving Image Captioning with Conditional Generative Adversarial NetsCode0
Solving the Rubik's Cube Without Human KnowledgeCode0
Two geometric input transformation methods for fast online reinforcement learning with neural nets0
Evolutionary RL for Container Loading0
Language Expansion In Text-Based Games0
Deep Reinforcement Learning for Resource Management in Network Slicing0
Learning Time-Sensitive Strategies in Space FortressCode0
FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning0
Optimized Computation Offloading Performance in Virtual Edge Computing Systems via Deep Reinforcement Learning0
Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning0
Feedback-Based Tree Search for Reinforcement Learning0
Graph Signal Sampling via Reinforcement Learning0
Leveraging human knowledge in tabular reinforcement learning: A study of human subjects0
Show:102550
← PrevPage 275 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified