| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Benchmarking Econometric and Machine Learning Methodologies in Nowcasting | May 6, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs | Jan 1, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task | May 27, 2022 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform | Oct 12, 2021 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Jul 10, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization | Nov 15, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| Benchmarking End-to-End Behavioural Cloning on Video Games | Apr 2, 2020 | Behavioural cloningBenchmarking | CodeCode Available | 1 |
| MIRFLEX: Music Information Retrieval Feature Library for Extraction | Nov 1, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 1 |
| 3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding | Oct 16, 2023 | Action RecognitionBenchmarking | CodeCode Available | 1 |