| Beyond neural scaling laws: beating power law scaling via data pruning | Jun 29, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction Tasks | Jul 26, 2024 | BenchmarkingModel Selection | CodeCode Available | 1 | 5 |
| IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics | Oct 11, 2018 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification | Jul 6, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 1 | 5 |
| A framework for benchmarking clustering algorithms | Sep 20, 2022 | BenchmarkingClustering | CodeCode Available | 1 | 5 |
| Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle? | Sep 29, 2023 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 | 5 |
| ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset | Jun 14, 2022 | BenchmarkingIschemic Stroke Lesion Segmentation | CodeCode Available | 1 | 5 |
| Open Radar Initiative: Large Scale Dataset for Benchmarking of micro-Doppler Recognition Algorithms | May 7, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification | Nov 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 | 5 |
| DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training | Mar 13, 2020 | BenchmarkingQuantization | CodeCode Available | 1 | 5 |
| A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models | Apr 22, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 1 | 5 |
| Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs | Sep 18, 2021 | BenchmarkingComplex Query Answering | CodeCode Available | 1 | 5 |
| BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing | Apr 2, 2025 | 3D ReconstructionBenchmarking | CodeCode Available | 1 | 5 |
| OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets | Nov 1, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Does your model understand genes? A benchmark of gene properties for biological and text models | Dec 5, 2024 | BenchmarkingMulti-class Classification | CodeCode Available | 1 | 5 |
| OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication | Sep 16, 2021 | 3D Object DetectionBenchmarking | CodeCode Available | 1 | 5 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| A framework for benchmarking class-out-of-distribution detection and its application to ImageNet | Feb 23, 2023 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 | 5 |
| Don’t be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System | Nov 1, 2021 | BenchmarkingResponse Generation | CodeCode Available | 1 | 5 |
| IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics | Nov 7, 2021 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 | 5 |
| Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment | Feb 21, 2024 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 | 5 |
| JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning | Jul 21, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 | 5 |
| Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy | Oct 23, 2020 | BenchmarkingDiagnostic | CodeCode Available | 1 | 5 |
| DomainLab: A modular Python package for domain generalization in deep learning | Mar 21, 2024 | BenchmarkingDomain Generalization | CodeCode Available | 1 | 5 |
| Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks | May 13, 2021 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Introducing Milabench: Benchmarking Accelerators for AI | Nov 18, 2024 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms | Jul 8, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| BEND: Benchmarking DNA Language Models on biologically meaningful tasks | Nov 21, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Introducing the VoicePrivacy Initiative | May 4, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| BenchML: an extensible pipelining framework for benchmarking representations of materials and molecules at scale | Dec 4, 2021 | BenchmarkingHyperparameter Optimization | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology | Jun 30, 2022 | BenchmarkingDiagnostic | CodeCode Available | 1 | 5 |
| Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM | Mar 28, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmark on Drug Target Interaction Modeling from a Structure Perspective | Jul 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Benchmarks for Deep Off-Policy Evaluation | Mar 30, 2021 | Benchmarkingcontinuous-control | CodeCode Available | 1 | 5 |
| Intrinsic Image Harmonization | Jun 19, 2021 | BenchmarkingImage Harmonization | CodeCode Available | 1 | 5 |
| Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets | Oct 22, 2020 | ArticlesBenchmarking | CodeCode Available | 1 | 5 |
| Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Mar 18, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |
| Event-Free Moving Object Segmentation from Moving Ego Vehicle | Apr 28, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation | Mar 7, 2024 | BenchmarkingMultimodal Recommendation | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions | Oct 13, 2021 | BenchmarkingComputational Efficiency | CodeCode Available | 1 | 5 |
| Benchmarking Image Retrieval for Visual Localization | Nov 24, 2020 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Mar 26, 2024 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 | 5 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Interpretable statistical representations of neural population dynamics and geometry | Apr 6, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Jun 19, 2025 | BenchmarkingDescriptive | CodeCode Available | 1 | 5 |
| Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks | Apr 5, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Physiology-based simulation of the retinal vasculature enables annotation-free segmentation of OCT angiographs | Jul 22, 2022 | BenchmarkingRetinal Vessel Segmentation | CodeCode Available | 1 | 5 |
| PIC4rl-gym: a ROS2 modular framework for Robots Autonomous Navigation with Deep Reinforcement Learning | Nov 19, 2022 | Autonomous NavigationBenchmarking | CodeCode Available | 1 | 5 |
| Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning | May 30, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation | Jul 13, 2023 | BenchmarkingGraph Embedding | CodeCode Available | 1 | 5 |