| Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control | Mar 3, 2021 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 2 | 5 |
| RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning | Apr 9, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 2 | 5 |
| REAL-Colon: A dataset for developing real-world AI applications in colonoscopy | Mar 4, 2024 | Benchmarking | CodeCode Available | 2 | 5 |
| Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Jun 21, 2024 | BenchmarkingText Generation | CodeCode Available | 2 | 5 |
| BARS: Towards Open Benchmarking for Recommender Systems | May 19, 2022 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 2 | 5 |
| Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach | Aug 31, 2019 | ArticlesBenchmarking | CodeCode Available | 2 | 5 |
| COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning | Jan 15, 2021 | BenchmarkingMisinformation | CodeCode Available | 1 | 5 |
| Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels | Jan 30, 2024 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| RADAR: Benchmarking Language Models on Imperfect Tabular Data | Jun 9, 2025 | BenchmarkingMissing Values | CodeCode Available | 1 | 5 |
| Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics | Jun 8, 2021 | Age And Gender ClassificationBenchmarking | CodeCode Available | 1 | 5 |