SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 89018925 of 15113 papers

TitleStatusHype
Regularizing Trajectory Optimization with Denoising Autoencoders0
Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game0
REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using Reinforcement Learning Agents0
ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture0
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning0
Reinforce Attack: Adversarial Attack against BERT with Reinforcement Learning0
Reinforced Anytime Bottom Up Rule Learning for Knowledge Graph Completion0
Reinforced Bit Allocation under Task-Driven Semantic Distortion Metrics0
Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center0
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation0
Reinforced dynamics for enhanced sampling in large atomic and molecular systems0
Reinforced Extractive Summarization with Question-Focused Rewards0
Reinforced Hybrid Genetic Algorithm for the Traveling Salesman Problem0
Reinforced Imitation in Heterogeneous Action Space0
Reinforced Imitation Learning by Free Energy Principle0
Reinforced Inverse Scattering0
Reinforced Labels: Multi-Agent Deep Reinforcement Learning for Point-Feature Label Placement0
Reinforced Latent Reasoning for LLM-based Recommendation0
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models0
Reinforced MOOCs Concept Recommendation in Heterogeneous Information Networks0
Reinforced Multi-task Approach for Multi-hop Question Generation0
Reinforced Pedestrian Attribute Recognition with Group Optimization Reward0
Reinforced Self-Training (ReST) for Language Modeling0
Reinforced Training Data Selection for Domain Adaptation0
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API0
Show:102550
← PrevPage 357 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified