SOTAVerified

NetHack

Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.

Papers

Showing 1120 of 28 papers

TitleStatusHype
CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning AgentsCode1
SILG: The Multi-environment Symbolic Interactive Language Grounding BenchmarkCode1
Motif: Intrinsic Motivation from Artificial Intelligence FeedbackCode1
Online Intrinsic Rewards for Decision Making Agents from Large Language Model FeedbackCode1
NovelD: A Simple yet Effective Exploration CriterionCode1
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot AgentsCode1
Scaling Laws for Imitation Learning in Single-Agent GamesCode1
diff History for Neural Language AgentsCode1
MaestroMotif: Skill Design from Artificial Intelligence Feedback0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.