| On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation | Nov 14, 2023 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Are Large Language Models Good at Utility Judgments? | Mar 28, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 |
| Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms | Jul 1, 2022 | BenchmarkingClassification | CodeCode Available | 0 |
| Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability | Feb 18, 2020 | BenchmarkingFederated Learning | CodeCode Available | 0 |
| VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks | May 16, 2025 | BenchmarkingLink Prediction | CodeCode Available | 0 |
| Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI | Mar 7, 2024 | Benchmarking | CodeCode Available | 0 |
| Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Aug 2, 2024 | Benchmarkingmultimodal interaction | CodeCode Available | 0 |
| DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions | May 8, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| OpenBioLink: A benchmarking framework for large-scale biomedical link prediction | Dec 10, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 |
| Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction | Jun 20, 2023 | BenchmarkingDocument-level Relation Extraction | CodeCode Available | 0 |
| Large Scale Clustering with Variational EM for Gaussian Mixture Models | Oct 1, 2018 | BenchmarkingClustering | CodeCode Available | 0 |
| AI Sound Recognition on Asthma Medication Adherence: Evaluation With the RDA Benchmark Suite | Feb 8, 2023 | BenchmarkingManagement | CodeCode Available | 0 |
| Dialogue Quality and Emotion Annotations for Customer Support Conversations | Nov 23, 2023 | BenchmarkingDiversity | CodeCode Available | 0 |
| STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking | May 16, 2025 | Benchmarking | CodeCode Available | 0 |
| OpenDenoising: an Extensible Benchmark for Building Comparative Studies of Image Denoisers | Oct 18, 2019 | BenchmarkingDenoising | CodeCode Available | 0 |
| OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression | Oct 27, 2023 | BenchmarkingGPU | CodeCode Available | 0 |
| Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework | Oct 24, 2024 | BenchmarkingDiversity | CodeCode Available | 0 |
| Towards Biologically Plausible and Private Gene Expression Data Generation | Feb 7, 2024 | Benchmarking | CodeCode Available | 0 |
| DFEE: Interactive DataFlow Execution and Evaluation Kit | Dec 4, 2022 | BenchmarkingScheduling | CodeCode Available | 0 |
| Towards causal benchmarking of bias in face analysis algorithms | Jul 13, 2020 | AttributeBenchmarking | CodeCode Available | 0 |
| SORCE: Small Object Retrieval in Complex Environments | May 30, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 0 |
| Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings | Apr 4, 2025 | Benchmarking | CodeCode Available | 0 |
| Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks | Sep 12, 2019 | Affordance DetectionAffordance Recognition | CodeCode Available | 0 |