SOTAVerified

Efficient Exploration

Efficient Exploration is one of the main obstacles in scaling up modern deep reinforcement learning algorithms. The main challenge in Efficient Exploration is the balance between exploiting current estimates, and gaining information about poorly understood states and actions.

Source: Randomized Value Functions via Multiplicative Normalizing Flows

Papers

Showing 426450 of 514 papers

TitleStatusHype
The split Gibbs sampler revisited: improvements to its algorithmic structure and augmented target distributionCode0
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and GeneralizationCode0
Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic SystemsCode0
Neural Contextual Bandits with UCB-based ExplorationCode0
Dynamic Subgoal-based Exploration via Bayesian OptimizationCode0
Hierarchical Spatial Proximity Reasoning for Vision-and-Language NavigationCode0
Estimating Risk and Uncertainty in Deep Reinforcement LearningCode0
Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgentCode0
Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood MatchingCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Variance Networks: When Expectation Does Not Meet Your ExpectationsCode0
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMsCode0
Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian OptimizationCode0
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsCode0
Noisy Natural Gradient as Variational InferenceCode0
Noisy Networks for ExplorationCode0
Angrier Birds: Bayesian reinforcement learningCode0
Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty EnvironmentsCode0
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context VariablesCode0
Nonlinear model reduction for slow-fast stochastic systems near unknown invariant manifoldsCode0
A New Bandit Setting Balancing Information from State Evolution and Corrupted ContextCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-TuningCode0
Instance Temperature Knowledge DistillationCode0
Consensus-based adaptive sampling and approximation for high-dimensional energy landscapesCode0
Show:102550
← PrevPage 18 of 21Next →

No leaderboard results yet.