SOTAVerified

NetHack

Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details.

Papers

Showing 2128 of 28 papers

TitleStatusHype
Improving Policy Learning via Language Dynamics DistillationCode0
Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation ProblemCode0
MaestroMotif: Skill Design from Artificial Intelligence Feedback0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Exploration in NetHack With Secret Discovery0
Accelerating exploration and representation learning with offline pre-training0
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors0
SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.