| On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation | Nov 14, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms | Jul 1, 2022 | BenchmarkingClassification | CodeCode Available | 0 |
| Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability | Feb 18, 2020 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks | May 16, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 |
| DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions | May 8, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| OpenBioLink: A benchmarking framework for large-scale biomedical link prediction | Dec 10, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 |
| Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction | Jun 20, 2023 | BenchmarkingDocument-level Relation Extraction | CodeCode Available | 0 |
| Large Scale Clustering with Variational EM for Gaussian Mixture Models | Oct 1, 2018 | BenchmarkingClustering | CodeCode Available | 0 |
| AI Sound Recognition on Asthma Medication Adherence: Evaluation With the RDA Benchmark Suite | Feb 8, 2023 | BenchmarkingManagement | CodeCode Available | 0 |
| Dialogue Quality and Emotion Annotations for Customer Support Conversations | Nov 23, 2023 | BenchmarkingDiversity | CodeCode Available | 0 |
| STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking | May 16, 2025 | Benchmarking | CodeCode Available | 0 |
| OpenDenoising: an Extensible Benchmark for Building Comparative Studies of Image Denoisers | Oct 18, 2019 | BenchmarkingDenoising | CodeCode Available | 0 |
| OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression | Oct 27, 2023 | BenchmarkingGPU | CodeCode Available | 0 |
| Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework | Oct 24, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Towards Biologically Plausible and Private Gene Expression Data Generation | Feb 7, 2024 | Benchmarking | CodeCode Available | 0 |
| DFEE: Interactive DataFlow Execution and Evaluation Kit | Dec 4, 2022 | BenchmarkingScheduling | CodeCode Available | 0 |
| Towards causal benchmarking of bias in face analysis algorithms | Jul 13, 2020 | AttributeBenchmarking | CodeCode Available | 0 |
| SORCE: Small Object Retrieval in Complex Environments | May 30, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 0 |
| Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings | Apr 4, 2025 | Benchmarking | CodeCode Available | 0 |
| Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks | Sep 12, 2019 | Affordance DetectionAffordance Recognition | CodeCode Available | 0 |
| CleanPatrick: A Benchmark for Image Data Cleaning | May 16, 2025 | BenchmarkingLabel Error Detection | CodeCode Available | 0 |
| Detecting critical treatment effect bias in small subgroups | Apr 29, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| AI-generated Image Quality Assessment in Visual Communication | Dec 20, 2024 | BenchmarkingImage Quality Assessment | CodeCode Available | 0 |
| SOSD: A Benchmark for Learned Indexes | Nov 29, 2019 | BenchmarkingManagement | CodeCode Available | 0 |
| OpenML Benchmarking Suites | Aug 11, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design | Oct 23, 2023 | BenchmarkingImage Generation | CodeCode Available | 0 |
| Design and implementation of intelligent packet filtering in IoT microcontroller-based devices | May 30, 2023 | Benchmarking | CodeCode Available | 0 |
| OpenOOD: Benchmarking Generalized Out-of-Distribution Detection | Oct 13, 2022 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Dermatological Diagnosis Explainability Benchmark for Convolutional Neural Networks | Feb 23, 2023 | BenchmarkingMedical Diagnosis | CodeCode Available | 0 |
| Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms | Apr 19, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 |
| Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark | Jun 14, 2025 | BenchmarkingGraph Learning | CodeCode Available | 0 |
| Towards Efficient and Scalable Training of Differentially Private Deep Learning | Jun 25, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters | Jun 16, 2024 | BenchmarkingInstance Segmentation | CodeCode Available | 0 |
| Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach | May 6, 2025 | BenchmarkingEarth Observation | CodeCode Available | 0 |
| Delta-Influence: Unlearning Poisons via Influence Functions | Nov 20, 2024 | AttributeBenchmarking | CodeCode Available | 0 |
| Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware | Dec 4, 2018 | BenchmarkingCPU | CodeCode Available | 0 |
| Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty | Nov 5, 2020 | Adversarial AttackBenchmarking | CodeCode Available | 0 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Deep Reinforcement Learning for General Video Game AI | Jun 6, 2018 | Atari GamesBenchmarking | CodeCode Available | 0 |
| DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding | Nov 7, 2023 | 3D ReconstructionBenchmarking | CodeCode Available | 0 |
| Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications | Jul 20, 2022 | Benchmarking | CodeCode Available | 0 |
| DeepOBS: A Deep Learning Optimizer Benchmark Suite | Mar 13, 2019 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| OptIForest: Optimal Isolation Forest for Anomaly Detection | Jun 22, 2023 | Anomaly DetectionBenchmarking | CodeCode Available | 0 |
| Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset | May 24, 2025 | BenchmarkingRAG | CodeCode Available | 0 |