| Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all | Oct 17, 2024 | AllBenchmarking | CodeCode Available | 1 | 5 |
| ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing | Mar 30, 2023 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model | Apr 10, 2024 | BenchmarkingImage-to-Image Translation | CodeCode Available | 1 | 5 |
| Benchmarking emergency department triage prediction models with machine learning and large public electronic health records | Nov 22, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It) | Jun 25, 2024 | BenchmarkingExperimental Design | CodeCode Available | 1 | 5 |
| CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification | Jun 18, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 | 5 |
| Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks | Jun 14, 2020 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| Benchmarking Language Models for Code Syntax Understanding | Oct 26, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction | Nov 16, 2023 | BenchmarkingEvent Extraction | CodeCode Available | 1 | 5 |
| Illuminating Darkness: Enhancing Real-world Low-light Scenes with Smartphone Images | Mar 10, 2025 | 4kBenchmarking | CodeCode Available | 1 | 5 |
| A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges | Oct 21, 2022 | BenchmarkingCommunity Detection | CodeCode Available | 1 | 5 |
| MEGA: Multilingual Evaluation of Generative AI | Mar 22, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Language Model Creativity: A Case Study on Code Generation | Jul 12, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking the Spectrum of Agent Capabilities | Sep 14, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking of DL Libraries and Models on Mobile Devices | Feb 14, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| MetaFormer and CNN Hybrid Model for Polyp Image Segmentation | Sep 16, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 | 5 |
| Meta-Surrogate Benchmarking for Hyperparameter Optimization | May 30, 2019 | BenchmarkingHyperparameter Optimization | CodeCode Available | 1 | 5 |
| Benchmarking Quantized Neural Networks on FPGAs with FINN | Feb 2, 2021 | BenchmarkingQuantization | CodeCode Available | 1 | 5 |
| Image Colorization: A Survey and Dataset | Aug 25, 2020 | BenchmarkingColorization | CodeCode Available | 1 | 5 |
| MGTBench: Benchmarking Machine-Generated Text Detection | Mar 26, 2023 | BenchmarkingQuestion Answering | CodeCode Available | 1 | 5 |
| IDToolkit: A Toolkit for Benchmarking and Developing Inverse Design Algorithms in Nanophotonics | May 30, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions | Oct 13, 2021 | BenchmarkingComputational Efficiency | CodeCode Available | 1 | 5 |
| Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation | Feb 18, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task | May 27, 2022 | BenchmarkingDomain Generalization | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions | Mar 29, 2024 | Action DetectionBenchmarking | CodeCode Available | 1 | 5 |
| Contemporary Symbolic Regression Methods and their Relative Performance | Jul 29, 2021 | Benchmarkingparameter estimation | CodeCode Available | 1 | 5 |
| Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph | May 23, 2025 | BenchmarkingManagement | CodeCode Available | 1 | 5 |
| minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models | Mar 24, 2022 | BenchmarkingSentence | CodeCode Available | 1 | 5 |
| ILIAS: Instance-Level Image retrieval At Scale | Feb 17, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 1 | 5 |
| Image Matching across Wide Baselines: From Paper to Practice | Mar 3, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining | Nov 22, 2017 | Benchmarkingfeature selection | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology | Jun 30, 2022 | BenchmarkingDiagnostic | CodeCode Available | 1 | 5 |
| Benchmarking the Performance of Bayesian Optimization across Multiple Experimental Materials Science Domains | May 23, 2021 | Active LearningBayesian Optimisation | CodeCode Available | 1 | 5 |
| iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities | Jun 27, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| AirSim Drone Racing Lab | Mar 12, 2020 | BenchmarkingOptical Flow Estimation | CodeCode Available | 1 | 5 |
| A framework for benchmarking clustering algorithms | Sep 20, 2022 | BenchmarkingClustering | CodeCode Available | 1 | 5 |
| ICU-Sepsis: A Benchmark MDP Built from Real Medical Data | Jun 9, 2024 | BenchmarkingManagement | CodeCode Available | 1 | 5 |
| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions | Jun 26, 2025 | BenchmarkingDrug Design | CodeCode Available | 1 | 5 |
| Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering | May 22, 2025 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| Benchmarking the Generation of Fact Checking Explanations | Aug 29, 2023 | Abstractive Text SummarizationArticles | CodeCode Available | 1 | 5 |
| Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification | Nov 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 | 5 |
| A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis | Aug 12, 2021 | BenchmarkingMedical Image Analysis | CodeCode Available | 1 | 5 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection | May 30, 2022 | 3D Object DetectionAutonomous Driving | CodeCode Available | 1 | 5 |
| A framework for benchmarking class-out-of-distribution detection and its application to ImageNet | Feb 23, 2023 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 | 5 |
| Benchmarking TinyML Systems: Challenges and Direction | Mar 10, 2020 | BenchmarkingPosition | CodeCode Available | 1 | 5 |
| Geometric Deep Learning for Structure-Based Drug Design: A Survey | Jun 20, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain | Jun 1, 2022 | BenchmarkingEmotion Recognition | CodeCode Available | 1 | 5 |
| iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations | Oct 17, 2022 | BenchmarkingText Classification | CodeCode Available | 1 | 5 |