| Benchmarking Model Predictive Control Algorithms in Building Optimization Testing Framework (BOPTEST) | Jan 31, 2023 | BenchmarkingModel Predictive Control | —Unverified | 0 |
| A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches | Oct 12, 2023 | BenchmarkingColorization | —Unverified | 0 |
| Exploration of TPUs for AI Applications | Sep 16, 2023 | BenchmarkingEdge-computing | —Unverified | 0 |
| CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation | May 30, 2025 | BenchmarkingMachine Translation | —Unverified | 0 |
| CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography | Apr 14, 2025 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability | Jul 9, 2024 | BenchmarkingDecoder | —Unverified | 0 |
| CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing | Jan 9, 2025 | BenchmarkingChatbot | —Unverified | 0 |
| Call for Action: towards the next generation of symbolic regression benchmark | May 6, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring | Nov 27, 2024 | BenchmarkingEarth Observation | —Unverified | 0 |
| A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations | May 20, 2025 | Benchmarking | —Unverified | 0 |
| Benchmarking Aggression Identification in Social Media | Aug 1, 2018 | Aggression IdentificationBenchmarking | —Unverified | 0 |
| Calibrating chemical multisensory devices for real world applications: An in-depth comparison of quantitative Machine Learning approaches | Aug 30, 2017 | Benchmarking | —Unverified | 0 |
| Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline | Aug 6, 2024 | Benchmarking | —Unverified | 0 |
| Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift | Jul 12, 2025 | BenchmarkingTransfer Learning | —Unverified | 0 |
| Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations | Jul 1, 2022 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization | Oct 7, 2021 | Benchmarking | —Unverified | 0 |
| Exploitation-Guided Exploration for Semantic Embodied Navigation | Nov 6, 2023 | Benchmarking | —Unverified | 0 |
| Exploring and Benchmarking the Planning Capabilities of Large Language Models | Jun 18, 2024 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Extensible Logging and Empirical Attainment Function for IOHexperimenter | Sep 28, 2021 | Benchmarking | —Unverified | 0 |
| CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations | Oct 2, 2024 | BenchmarkingLong Form Question Answering | —Unverified | 0 |
| Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report | Oct 5, 2023 | Benchmarking | —Unverified | 0 |
| CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods | Oct 10, 2023 | BenchmarkingPrediction | —Unverified | 0 |
| Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic | Oct 23, 2023 | BenchmarkingInstruction Following | —Unverified | 0 |
| Quantum Similarity Testing with Convolutional Neural Networks | Nov 3, 2022 | Benchmarking | —Unverified | 0 |
| Explainable AI using expressive Boolean formulas | Jun 6, 2023 | BenchmarkingExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering | Sep 13, 2024 | BenchmarkingBinary Classification | —Unverified | 0 |
| Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks | Mar 15, 2024 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view | May 4, 2023 | BenchmarkingGraph Generation | —Unverified | 0 |
| Benchmarking Adversarial Robustness of Compressed Deep Learning Models | Aug 16, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs | May 24, 2025 | Benchmarking | —Unverified | 0 |
| A Benchmarking Protocol for Pansharpening: Dataset, Preprocessing, and Quality Assessment | Jun 7, 2021 | BenchmarkingPansharpening | —Unverified | 0 |
| Benchmarking Adversarial Robustness | Dec 26, 2019 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| Experimenting with robotic intra-logistics domains | Apr 26, 2018 | Benchmarkingvalid | —Unverified | 0 |
| Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example | Jul 4, 2020 | BenchmarkingPosition | —Unverified | 0 |
| Experimental robustness benchmark of quantum neural network on a superconducting quantum processor | May 22, 2025 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| Benchmarking Adversarially Robust Quantum Machine Learning at Scale | Nov 23, 2022 | Adversarial AttackAdversarial Attack Detection | —Unverified | 0 |
| Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite | May 24, 2023 | Benchmarking | —Unverified | 0 |
| ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists | Jun 2, 2025 | BenchmarkingForm | —Unverified | 0 |
| Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP) | Oct 14, 2024 | BenchmarkingMulti-Task Learning | —Unverified | 0 |
| Benchmarking adversarial attacks and defenses for time-series data | Aug 30, 2020 | Adversarial DefenseBenchmarking | —Unverified | 0 |
| Analysis of different disparity estimation techniques on aerial stereo image datasets | Oct 9, 2024 | BenchmarkingDepth Estimation | —Unverified | 0 |
| Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text | Nov 1, 2019 | BenchmarkingDe-identification | —Unverified | 0 |
| Building a continuous benchmarking ecosystem in bioinformatics | Sep 23, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches | Apr 22, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration | Sep 30, 2024 | BenchmarkingIntent Detection | —Unverified | 0 |
| BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer | May 24, 2023 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 |
| AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit | Feb 13, 2025 | BenchmarkingEdge-computing | —Unverified | 0 |
| BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes | Nov 11, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 |
| Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances | Aug 3, 2023 | Benchmarking | —Unverified | 0 |
| Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark | Jun 4, 2018 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |