SOTAVerified|Agents Browse Leaderboard About Blog

NetHack

Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 28 papers

Title	Date	Tasks	Status	Hype
LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents	Jul 17, 2023	NetHack	CodeCode Available	1
CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning Agents	Oct 19, 2021	NetHackreinforcement-learning	CodeCode Available	1
Motif: Intrinsic Motivation from Artificial Intelligence Feedback	Sep 29, 2023	Decision MakingLanguage Modeling	CodeCode Available	1
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback	Oct 30, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
NovelD: A Simple yet Effective Exploration Criterion	Dec 1, 2021	Atari GamesDeep Reinforcement Learning	CodeCode Available	1
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents	Mar 1, 2024	Decision MakingMinecraft	CodeCode Available	1
Scaling Laws for Imitation Learning in Single-Agent Games	Jul 18, 2023	Atari GamesImitation Learning	CodeCode Available	1
diff History for Neural Language Agents	Dec 12, 2023	Decision MakingNetHack	CodeCode Available	1
MaestroMotif: Skill Design from Artificial Intelligence Feedback	Dec 11, 2024	Code GenerationDecision Making	—Unverified	0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Nov 20, 2024	BenchmarkingNetHack	—Unverified	0

Show:10 25 50

← PrevPage 2 of 3Next →

No leaderboard results yet.