SOTAVerified

Efficient Exploration

Efficient Exploration is one of the main obstacles in scaling up modern deep reinforcement learning algorithms. The main challenge in Efficient Exploration is the balance between exploiting current estimates, and gaining information about poorly understood states and actions.

Source: Randomized Value Functions via Multiplicative Normalizing Flows

Papers

Showing 401450 of 514 papers

TitleStatusHype
Adaptive teachers for amortized samplersCode0
TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly DetectionCode0
ASCENT: Amplifying Power Side-Channel Resilience via Learning & Monte-Carlo Tree SearchCode0
Generalization and Exploration via Randomized Value FunctionsCode0
Personalized Algorithmic Recourse with Preference ElicitationCode0
Feature Interaction Aware Automated Data Representation TransformationCode0
EXPODE: EXploiting POlicy Discrepancy for Efficient Exploration in Multi-agent Reinforcement LearningCode0
GenPlan: Generative Sequence Models as Adaptive PlannersCode0
Exploring through Random Curiosity with General Value FunctionsCode0
Scalable Exploration via Ensemble++Code0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal BabblingCode0
Exploratory State Representation LearningCode0
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human PriorsCode0
Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal GuidanceCode0
Go Beyond Imagination: Maximizing Episodic Reachability with World ModelsCode0
Receding Horizon CuriosityCode0
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability ObjectivesCode0
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability GraphsCode0
Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse RewardsCode0
Umbrella Reinforcement Learning -- computationally efficient tool for hard non-linear problemsCode0
Multirobot Coverage of Modular EnvironmentsCode0
Sparse Reward Exploration via Novelty Search and EmittersCode0
Count-Based Exploration in Feature Space for Reinforcement LearningCode0
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision ProcessesCode0
The split Gibbs sampler revisited: improvements to its algorithmic structure and augmented target distributionCode0
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and GeneralizationCode0
Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic SystemsCode0
Neural Contextual Bandits with UCB-based ExplorationCode0
Dynamic Subgoal-based Exploration via Bayesian OptimizationCode0
Hierarchical Spatial Proximity Reasoning for Vision-and-Language NavigationCode0
Estimating Risk and Uncertainty in Deep Reinforcement LearningCode0
Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgentCode0
Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood MatchingCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Variance Networks: When Expectation Does Not Meet Your ExpectationsCode0
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMsCode0
Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian OptimizationCode0
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsCode0
Noisy Natural Gradient as Variational InferenceCode0
Noisy Networks for ExplorationCode0
Angrier Birds: Bayesian reinforcement learningCode0
Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty EnvironmentsCode0
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context VariablesCode0
Nonlinear model reduction for slow-fast stochastic systems near unknown invariant manifoldsCode0
A New Bandit Setting Balancing Information from State Evolution and Corrupted ContextCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-TuningCode0
Instance Temperature Knowledge DistillationCode0
Consensus-based adaptive sampling and approximation for high-dimensional energy landscapesCode0
Show:102550
← PrevPage 9 of 11Next →

No leaderboard results yet.