SOTAVerified

NetHack

Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.

Papers

Showing 125 of 28 papers

TitleStatusHype
MaestroMotif: Skill Design from Artificial Intelligence Feedback0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Syllabus: Portable Curricula for Reinforcement Learning AgentsCode2
Online Intrinsic Rewards for Decision Making Agents from Large Language Model FeedbackCode1
PufferLib: Making Reinforcement Learning Libraries and Environments Play NiceCode4
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot AgentsCode1
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement LearningCode3
Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation ProblemCode0
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable SkillsCode1
diff History for Neural Language AgentsCode1
Motif: Intrinsic Motivation from Artificial Intelligence FeedbackCode1
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors0
Scaling Laws for Imitation Learning in Single-Agent GamesCode1
LuckyMera: a Modular AI Framework for Building Hybrid NetHack AgentsCode1
Katakomba: Tools and Benchmarks for Data-Driven NetHackCode1
Accelerating exploration and representation learning with offline pre-training0
Dungeons and Data: A Large-Scale NetHack DatasetCode2
Improving Policy Learning via Language Dynamics DistillationCode0
Hierarchical Kickstarting for Skill Transfer in Reinforcement LearningCode1
Insights From the NeurIPS 2021 NetHack ChallengeCode0
SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark0
NovelD: A Simple yet Effective Exploration CriterionCode1
SILG: The Multi-environment Symbolic Interactive Language Grounding BenchmarkCode1
CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning AgentsCode1
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning ResearchCode0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.