SOTAVerified

Efficient Exploration

Efficient Exploration is one of the main obstacles in scaling up modern deep reinforcement learning algorithms. The main challenge in Efficient Exploration is the balance between exploiting current estimates, and gaining information about poorly understood states and actions.

Source: Randomized Value Functions via Multiplicative Normalizing Flows

Papers

Showing 51100 of 514 papers

TitleStatusHype
HyperDQN: A Randomized Exploration Method for Deep Reinforcement LearningCode1
Hybrid Genetic Search for the CVRP: Open-Source Implementation and SWAP* NeighborhoodCode1
BeBold: Exploration Beyond the Boundary of Explored RegionsCode1
Safe Guaranteed Exploration for Non-linear SystemsCode1
SC-Explorer: Incremental 3D Scene Completion for Safe and Efficient Exploration Mapping and PlanningCode1
Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial GamesCode1
Leveraging Skills from Unlabeled Prior Data for Efficient Online ExplorationCode1
Learning Dexterous Manipulation from Exemplar Object Trajectories and Pre-GraspsCode1
DeepDrummer : Generating Drum Loops using Deep Learning and a Human in the LoopCode1
A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?Code1
Landmark-Guided Subgoal Generation in Hierarchical Reinforcement LearningCode1
Maximum Entropy Reinforcement Learning with Diffusion PolicyCode1
Learning Exploration Policies for NavigationCode1
Paradiseo: From a Modular Framework for Evolutionary Computation to the Automated Design of Metaheuristics ---22 Years of Paradiseo---Code1
A Survey of Label-Efficient Deep Learning for 3D Point CloudsCode1
Layered and Staged Monte Carlo Tree Search for SMT Strategy SynthesisCode1
Adversarially Guided Actor-CriticCode1
Training a Generally Curious AgentCode1
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem SolvingCode0
Adaptive teachers for amortized samplersCode0
Bootstrapped Meta-LearningCode0
ASCENT: Amplifying Power Side-Channel Resilience via Learning & Monte-Carlo Tree SearchCode0
Lagrangian Manifold Monte Carlo on Monge PatchesCode0
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-TuningCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
Instance Temperature Knowledge DistillationCode0
Better Exploration with Optimistic Actor CriticCode0
Scalable Exploration via Ensemble++Code0
Large-Batch, Iteration-Efficient Neural Bayesian Design OptimizationCode0
Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement LearningCode0
Angrier Birds: Bayesian reinforcement learningCode0
Hierarchical Spatial Proximity Reasoning for Vision-and-Language NavigationCode0
Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgentCode0
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and GeneralizationCode0
An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement LearningCode0
Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic SystemsCode0
Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal GuidanceCode0
Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted IlluminationCode0
Bayesian Curiosity for Efficient Exploration in Reinforcement LearningCode0
Go Beyond Imagination: Maximizing Episodic Reachability with World ModelsCode0
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsCode0
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement LearningCode0
Amortized Variational Deep Q NetworkCode0
Generalization and Exploration via Randomized Value FunctionsCode0
Batch Bayesian Optimization via Local PenalizationCode0
Curiosity Driven Exploration of Learned Disentangled Goal SpacesCode0
GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal BabblingCode0
Personalized Algorithmic Recourse with Preference ElicitationCode0
Curiosity as a Self-Supervised Method to Improve Exploration in De novo Drug DesignCode0
Balancing Value Underestimation and Overestimation with Realistic Actor-CriticCode0
Show:102550
← PrevPage 2 of 11Next →

No leaderboard results yet.