| A Baseline Statistical Method For Robust User-Assisted Multiple Segmentation | Jan 8, 2022 | BenchmarkingImage Segmentation | CodeCode Available | 0 |
| COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting | Mar 29, 2016 | BenchmarkingMultiobjective Optimization | CodeCode Available | 0 |
| VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric | Jun 7, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| CNM: An Interpretable Complex-valued Network for Matching | Apr 10, 2019 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures | Nov 17, 2018 | BenchmarkingClustering | CodeCode Available | 0 |
| QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers | Oct 8, 2024 | Benchmarking | CodeCode Available | 0 |
| TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations | Oct 10, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds | Dec 13, 2017 | BenchmarkingModel-based Reinforcement Learning | CodeCode Available | 0 |
| Benchmarking AutoML algorithms on a collection of synthetic classification problems | Dec 6, 2022 | AutoMLBenchmarking | CodeCode Available | 0 |
| Benchmarking a transformer-FREE model for ad-hoc retrieval | Apr 1, 2021 | BenchmarkingCPU | CodeCode Available | 0 |
| Benchmarking Approximate Inference Methods for Neural Structured Prediction | Apr 1, 2019 | BenchmarkingPrediction | CodeCode Available | 0 |
| LMEMs for post-hoc analysis of HPO Benchmarking | Aug 5, 2024 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative Metrics | Jul 5, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection | Feb 20, 2018 | ArticlesBenchmarking | CodeCode Available | 0 |
| Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification | Sep 21, 2022 | BenchmarkingManagement | CodeCode Available | 0 |
| Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed Domains | Nov 1, 2021 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models | Oct 16, 2023 | Automated Theorem ProvingBenchmarking | CodeCode Available | 0 |
| Quality Indicators for Preference-based Evolutionary Multi-objective Optimization Using a Reference Point: A Review and Analysis | Jan 28, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| CLMB: deep contrastive learning for robust metagenomic binning | Nov 18, 2021 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts | May 25, 2023 | Benchmarkingobject-detection | CodeCode Available | 0 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 |
| Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems | Apr 4, 2025 | BenchmarkingModel Selection | CodeCode Available | 0 |
| Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration | Jan 27, 2023 | BenchmarkingGraph Classification | CodeCode Available | 0 |
| Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System | Jul 28, 2023 | Anomaly DetectionAutonomous Driving | CodeCode Available | 0 |
| Benchmarking and Understanding Compositional Relational Reasoning of LLMs | Dec 17, 2024 | BenchmarkingRelational Reasoning | CodeCode Available | 0 |
| Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases | Mar 6, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus Detection | Nov 23, 2018 | BenchmarkingCervical Nucleus Detection | CodeCode Available | 0 |
| ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures | Jun 14, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| Benchmarking and Rethinking Knowledge Editing for Large Language Models | May 24, 2025 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| CLEAVE: Scalable and Edge-native Benchmarking of Networked Control Systems | Apr 5, 2022 | BenchmarkingEdge-computing | CodeCode Available | 0 |
| Quantitative Metrics for Benchmarking Human-Aware Robot Navigation | Jul 26, 2023 | BenchmarkingRobot Navigation | CodeCode Available | 0 |
| Benchmarking and optimizing organism wide single-cell RNA alignment methods | Mar 26, 2025 | BenchmarkingDecoder | CodeCode Available | 0 |
| XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series Classification | Oct 23, 2023 | BenchmarkingTime Series | CodeCode Available | 0 |
| CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models | Mar 6, 2025 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| Benchmarking and Improving Text-to-SQL Generation under Ambiguity | Oct 20, 2023 | BenchmarkingDiversity | CodeCode Available | 0 |
| Quantum Boosting using Domain-Partitioning Hypotheses | Oct 25, 2021 | BenchmarkingEnsemble Learning | CodeCode Available | 0 |
| TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs | May 16, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation | Apr 5, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations | Mar 9, 2024 | BenchmarkingCPU | CodeCode Available | 0 |
| TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images | Apr 1, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papers | Nov 6, 2021 | BenchmarkingRetinal Vessel Segmentation | CodeCode Available | 0 |
| Adversarial Environment Generation for Learning to Navigate the Web | Mar 2, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 |
| A*3D Dataset: Towards Autonomous Driving in Challenging Environments | Sep 17, 2019 | 3D Object DetectionAutonomous Driving | CodeCode Available | 0 |
| TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring | Mar 23, 2024 | BenchmarkingText to SQL | CodeCode Available | 0 |
| Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies | Mar 11, 2024 | BenchmarkingData Augmentation | CodeCode Available | 0 |
| Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample | Jan 28, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Quaternion Capsule Networks | Jul 8, 2020 | BenchmarkingObject Recognition | CodeCode Available | 0 |
| QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results | Dec 19, 2021 | BenchmarkingBrain Tumor Segmentation | CodeCode Available | 0 |
| QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs | Dec 16, 2024 | BenchmarkingCommon Sense Reasoning | CodeCode Available | 0 |
| Question-Answering Dense Video Events | Sep 6, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |