| Benchmarking Counterfactual Image Generation | Mar 29, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 1 | 5 |
| Benchmarking Constraint Inference in Inverse Reinforcement Learning | Jun 20, 2022 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials | Nov 6, 2021 | BenchmarkingNeural Network simulation | CodeCode Available | 1 | 5 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Chaos as an interpretable benchmark for forecasting and data-driven modelling | Oct 11, 2021 | BenchmarkingSymbolic Regression | CodeCode Available | 1 | 5 |
| CharacterBench: Benchmarking Character Customization of Large Language Models | Dec 16, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset | Sep 16, 2021 | BenchmarkingKnowledge Base Population | CodeCode Available | 1 | 5 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 | 5 |