SOTAVerified

Continuous Control

Continuous control in the context of playing games, especially within artificial intelligence (AI) and machine learning (ML), refers to the ability to make a series of smooth, ongoing adjustments or actions to control a game or a simulation. This is in contrast to discrete control, where the actions are limited to a set of specific, distinct choices. Continuous control is crucial in environments where precision, timing, and the magnitude of actions matter, such as driving a car in a racing game, controlling a character in a simulation, or managing the flight of an aircraft in a flight simulator.

Papers

Showing 9511000 of 1161 papers

TitleStatusHype
Hierarchical State Abstraction Based on Structural Information PrinciplesCode0
Hierarchical Reinforcement Learning via Advantage-Weighted Information MaximizationCode0
Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual InformationCode0
Hallucinated Adversarial Control for Conservative Offline Policy EvaluationCode0
Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function ApproximationCode0
Primal Wasserstein Imitation LearningCode0
Learning model-based planning from scratchCode0
Diagnosing Bottlenecks in Deep Q-learning AlgorithmsCode0
Guided Policy Optimization under Partial ObservabilityCode0
Context-Based Soft Actor Critic for Environments with Non-stationary DynamicsCode0
Conservative Bayesian Model-Based Value Expansion for Offline Policy OptimizationCode0
Behaviour DistillationCode0
Sample-Efficient Imitation Learning via Generative Adversarial NetsCode0
TaSIL: Taylor Series Imitation LearningCode0
Guided Exploration in Reinforcement Learning via Monte Carlo Critic OptimizationCode0
Learning Sparse Rewarded Tasks from Sub-Optimal DemonstrationsCode0
Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive LearningCode0
Learning Stabilizable Nonlinear Dynamics with Contraction-Based RegularizationCode0
Learning State Abstractions for Transfer in Continuous ControlCode0
Learning State Representations via Retracing in Reinforcement LearningCode0
Bayesian Policy Gradients via Alpha Divergence Dropout InferenceCode0
CompILE: Compositional Imitation Learning and ExecutionCode0
A novel DDPG method with prioritized experience replayCode0
BaRC: Backward Reachability Curriculum for Robotic Reinforcement LearningCode0
Guide Actor-Critic for Continuous ControlCode0
Proximal Policy DistillationCode0
Defending Observation Attacks in Deep Reinforcement Learning via Detection and DenoisingCode0
Gradient Information Matters in Policy Optimization by Back-propagating through ModelCode0
Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward ModelCode0
TEAC: Intergrating Trust Region and Max Entropy Actor Critic for Continuous ControlCode0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Learning with Expert Abstractions for Efficient Multi-Task Continuous ControlCode0
Balancing Value Underestimation and Overestimation with Realistic Actor-CriticCode0
Leveraging exploration in off-policy algorithms via normalizing flowsCode0
Adversarial Policy Optimization for Offline Preference-based Reinforcement LearningCode0
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision ProblemsCode0
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation LearningCode0
Live in the Moment: Learning Dynamics Model Adapted to Evolving PolicyCode0
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement LearningCode0
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement LearningCode0
Scalable Decision-Making in Stochastic Environments through Learned Temporal AbstractionCode0
Locally Persistent Exploration in Continuous Control Tasks with Sparse RewardsCode0
Collaborative Evolutionary Reinforcement LearningCode0
Optimizing Attention and Cognitive Control Costs Using Temporally-Layered ArchitecturesCode0
Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning AgentsCode0
Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark GraphsCode0
Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-SnapshotsCode0
General Policy Evaluation and Improvement by Learning to Identify Few But Crucial StatesCode0
Lyapunov-based Safe Policy Optimization for Continuous ControlCode0
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy CriticCode0
Show:102550
← PrevPage 20 of 24Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SAC gSDEReturn3,459Unverified
2TD3 gSDEReturn3,267Unverified
3TD3Return2,865Unverified
4SACReturn2,859Unverified
5PPO gSDEReturn2,587Unverified
6A2C gSDEReturn2,560Unverified
7PPOReturn2,160Unverified
8A2CReturn1,967Unverified
#ModelMetricClaimedVerifiedStatus
1SACReturn2,883Unverified
2SAC gSDEReturn2,850Unverified
3PPO + gSDEReturn2,760Unverified
4TD3Return2,687Unverified
5TD3 gSDEReturn2,578Unverified
6PPOReturn2,254Unverified
7A2C + gSDEReturn2,028Unverified
8A2CReturn1,652Unverified
#ModelMetricClaimedVerifiedStatus
1SAC gSDEReturn2,646Unverified
2PPO gSDEReturn2,508Unverified
3SACReturn2,477Unverified
4TD3Return2,470Unverified
5TD3 gSDEReturn2,353Unverified
6PPOReturn1,622Unverified
7A2CReturn1,559Unverified
8A2C gSDEReturn1,448Unverified
#ModelMetricClaimedVerifiedStatus
1SAC gSDEReturn2,341Unverified
2SACReturn2,215Unverified
3TD3Return2,106Unverified
4TD3 gSDEReturn1,989Unverified
5PPO gSDEReturn1,776Unverified
6PPOReturn1,238Unverified
7A2C gSDEReturn694Unverified
8A2CReturn443Unverified
#ModelMetricClaimedVerifiedStatus
1DreamerV1Return800Unverified
2SLACReturn700Unverified
3DrQReturn660Unverified
4PlaNetReturn650Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn998.14Unverified
2DREAMERReturn853Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn868.87Unverified
2MuZero UnpluggedReturn594.3Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn914.39Unverified
2MuZero UnpluggedReturn869.9Unverified
#ModelMetricClaimedVerifiedStatus
1DrQReturn963Unverified
2PlaNetReturn914Unverified
#ModelMetricClaimedVerifiedStatus
1DrQReturn921Unverified
2PlaNetReturn890Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn963.07Unverified
2MuZero UnpluggedReturn759Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn987.79Unverified
2MuZero UnpluggedReturn887.2Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn975.46Unverified
2MuZero UnpluggedReturn949.5Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore1,353.8Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-326Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-83.3Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-149.6Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn417.52Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-170.9Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore730.2Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-0.4Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore0Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn977.38Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore769Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore959Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn984.86Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore4,869.8Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore960.2Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore606.2Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore980.3Unverified
#ModelMetricClaimedVerifiedStatus
1MACScore178.3Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore582Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore841Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn846.91Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore299Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore518Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore4,412.4Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn986.38Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore767Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore926Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn972.53Unverified
#ModelMetricClaimedVerifiedStatus
1MuZero UnpluggedReturn681.6Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore287Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore1,914Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore1,183.3Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn528.24Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn926.5Unverified
#ModelMetricClaimedVerifiedStatus
1MuZero UnpluggedReturn643.1Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore247.2Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore4.5Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore10.4Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore14.1Unverified
#ModelMetricClaimedVerifiedStatus
1MACScore163.5Unverified
#ModelMetricClaimedVerifiedStatus
1MuZero UnpluggedReturn659.2Unverified
#ModelMetricClaimedVerifiedStatus
1MuZero UnpluggedReturn556Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-61.7Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-64.2Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-60.2Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore-61.6Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn837.76Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn923.54Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn933.77Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn982.26Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore538Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore929Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn971.53Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore269.7Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore96Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore0Unverified
#ModelMetricClaimedVerifiedStatus
1TRPOScore0Unverified
#ModelMetricClaimedVerifiedStatus
1SMuZeroReturn931.06Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore403Unverified
#ModelMetricClaimedVerifiedStatus
1CURLScore902Unverified