| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 |
| LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite | Sep 28, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Constraint Inference in Inverse Reinforcement Learning | Jun 20, 2022 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics | Jun 3, 2024 | Audio ClassificationBenchmarking | CodeCode Available | 1 |
| CodeS: Natural Language to Code Repository via Multi-Layer Sketch | Mar 25, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking large language models for biomedical natural language processing applications and recommendations | May 10, 2023 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models | Mar 15, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| Benchmarking Counterfactual Image Generation | Mar 29, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 1 |
| AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials | Nov 29, 2022 | Benchmarking | CodeCode Available | 1 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials | Nov 6, 2021 | BenchmarkingNeural Network simulation | CodeCode Available | 1 |
| A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care | Sep 16, 2022 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Benchmarking Compositionality with Formal Languages | Aug 17, 2022 | BenchmarkingOpen-Ended Question Answering | CodeCode Available | 1 |
| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Nov 26, 2024 | BenchmarkingText-to-Video Generation | CodeCode Available | 1 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| LEMUR Neural Network Dataset: Towards Seamless AutoML | Apr 14, 2025 | AutoMLBenchmarking | CodeCode Available | 1 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |
| LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation | Feb 28, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 |
| MC-Blur: A Comprehensive Benchmark for Image Deblurring | Dec 1, 2021 | BenchmarkingDeblurring | CodeCode Available | 1 |
| Large Scale MRI Collection and Segmentation of Cirrhotic Liver | Oct 6, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| Light Field Salient Object Detection: A Review and Benchmark | Oct 10, 2020 | BenchmarkingObject | CodeCode Available | 1 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |