| Benchmarking Pathology Feature Extractors for Whole Slide Image Classification | Nov 20, 2023 | Benchmarkingimage-classification | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Feb 26, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction | Jul 25, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging | Apr 26, 2020 | BenchmarkingLeft Atrium Segmentation | CodeCode Available | 1 |
| A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models | Mar 31, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| A global analysis of metrics used for measuring performance in natural language processing | Apr 25, 2022 | BenchmarkingMachine Translation | CodeCode Available | 1 |
| Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces | May 23, 2023 | Benchmarking | CodeCode Available | 1 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 |