SOTAVerified|Agents Browse Leaderboard About Blog

NetHack

Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–28 of 28 papers

Title	Date	Tasks	Status	Score
Improving Policy Learning via Language Dynamics Distillation	Sep 30, 2022	NetHackReinforcement Learning (RL)	CodeCode Available	5
MaestroMotif: Skill Design from Artificial Intelligence Feedback	Dec 11, 2024	Code GenerationDecision Making	—Unverified	0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Nov 20, 2024	BenchmarkingNetHack	—Unverified	0
Exploration in NetHack With Secret Discovery	Nov 8, 2017	NetHack	—Unverified	0
Accelerating exploration and representation learning with offline pre-training	Mar 31, 2023	Decision MakingNetHack	—Unverified	0
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research	Sep 27, 2021	Deep Reinforcement LearningNetHack	—Unverified	0
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors	Jul 21, 2023	Decision MakingLanguage Modeling	—Unverified	0
SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark	Dec 1, 2021	Grounded language learningNetHack	—Unverified	0

Show:10 25 50

← PrevPage 3 of 3Next →

No leaderboard results yet.