| MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds | Dec 20, 2023 | Minecraft | CodeCode Available | 1 |
| BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks | Dec 5, 2023 | BenchmarkingMinecraft | CodeCode Available | 1 |
| Creative Agents: Empowering Agents with Imagination for Creative Tasks | Dec 5, 2023 | Instruction FollowingLanguage Modelling | CodeCode Available | 1 |
| Convolutional State Space Models for Long-Range Spatiotemporal Modeling | Oct 30, 2023 | MinecraftState Space Models | CodeCode Available | 1 |
| Towards Evaluating Generalist Agents: An Automated Benchmark in Open World | Oct 12, 2023 | BenchmarkingDiversity | CodeCode Available | 1 |
| Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge | Oct 8, 2023 | ARCLanguage Modeling | CodeCode Available | 1 |
| SmartPlay: A Benchmark for LLMs as Intelligent Agents | Oct 2, 2023 | MinecraftSpatial Reasoning | CodeCode Available | 1 |
| Semantic HELM: A Human-Readable Memory for Reinforcement Learning | Jun 15, 2023 | Dota 2Language Modelling | CodeCode Available | 1 |
| SPRING: Studying the Paper and Reasoning to Play Games | May 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution | May 9, 2023 | ClassificationMinecraft | CodeCode Available | 1 |