| Benchmarking Graph Neural Networks | Mar 2, 2020 | BenchmarkingGraph Classification | CodeCode Available | 2 |
| Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach | Aug 31, 2019 | ArticlesBenchmarking | CodeCode Available | 2 |
| Habitat: A Platform for Embodied AI Research | Apr 2, 2019 | BenchmarkingGPU | CodeCode Available | 2 |
| Benchmarking Neural Network Robustness to Common Corruptions and Perturbations | Mar 28, 2019 | Adversarial DefenseBenchmarking | CodeCode Available | 2 |
| A large annotated medical image dataset for the development and evaluation of segmentation algorithms | Feb 25, 2019 | BenchmarkingSegmentation | CodeCode Available | 2 |
| Benchmarking Deep Reinforcement Learning for Continuous Control | Apr 22, 2016 | Action Triplet RecognitionAtari Games | CodeCode Available | 2 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data | Jul 3, 2025 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions | Jun 26, 2025 | BenchmarkingDrug Design | CodeCode Available | 1 |
| WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads | Jun 25, 2025 | Benchmarking | CodeCode Available | 1 |