| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Constraint Inference in Inverse Reinforcement Learning | Jun 20, 2022 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Counterfactual Image Generation | Mar 29, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 1 | 5 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset | Sep 16, 2021 | BenchmarkingKnowledge Base Population | CodeCode Available | 1 | 5 |
| A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification | Nov 28, 2022 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | May 10, 2024 | BenchmarkingPoint Cloud Registration | CodeCode Available | 1 | 5 |