| EffiBench: Benchmarking the Efficiency of Automatically Generated Code | Feb 3, 2024 | BenchmarkingCode Completion | CodeCode Available | 2 | 5 |
| InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback | Jun 26, 2023 | BenchmarkingCode Generation | CodeCode Available | 2 | 5 |
| Benchmarking Benchmark Leakage in Large Language Models | Apr 29, 2024 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 | 5 |
| LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | Oct 13, 2024 | BenchmarkingGraph Generation | CodeCode Available | 2 | 5 |
| A large annotated medical image dataset for the development and evaluation of segmentation algorithms | Feb 25, 2019 | BenchmarkingSegmentation | CodeCode Available | 2 | 5 |
| LaMAR: Benchmarking Localization and Mapping for Augmented Reality | Oct 19, 2022 | BenchmarkingDiversity | CodeCode Available | 2 | 5 |
| Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning | May 20, 2024 | BenchmarkingMRI segmentation | CodeCode Available | 2 | 5 |
| An OpenMind for 3D medical vision self-supervised learning | Dec 22, 2024 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 2 | 5 |
| Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control | Mar 3, 2021 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 2 | 5 |
| State-specific protein-ligand complex structure prediction with a multi-scale deep generative model | Sep 30, 2022 | BenchmarkingBlind Docking | CodeCode Available | 2 | 5 |