| Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors | Nov 21, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks | Nov 23, 2022 | BenchmarkingDeep Learning | —Unverified | 0 | 0 |
| Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection | Jul 28, 2024 | BenchmarkingFake News Detection | —Unverified | 0 | 0 |
| Off-policy Evaluation for Payments at Adyen | Jan 15, 2025 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation | Jul 11, 2023 | BenchmarkingCausal Discovery | —Unverified | 0 | 0 |
| TransBench: Benchmarking Machine Translation for Industrial-Scale Applications | May 20, 2025 | BenchmarkingMachine Translation | —Unverified | 0 | 0 |
| OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics | Jun 12, 2025 | Benchmarking | —Unverified | 0 | 0 |
| IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model | Aug 2, 2024 | BenchmarkingFeature Engineering | —Unverified | 0 | 0 |
| Benchmarking Azerbaijani Neural Machine Translation | Jul 29, 2022 | BenchmarkingDomain Generalization | —Unverified | 0 | 0 |
| Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver | Nov 20, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking | Jun 6, 2024 | 6D Pose Estimation using RGBBenchmarking | —Unverified | 0 | 0 |
| Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims | Jul 22, 2021 | AutoMLBenchmarking | —Unverified | 0 | 0 |
| Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics | Sep 25, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions | Dec 9, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Benchmarking Automated Review Response Generation for the Hospitality Domain | Dec 1, 2020 | BenchmarkingDomain Adaptation | —Unverified | 0 | 0 |
| Benchmarking Automated Machine Learning Methods for Price Forecasting Applications | Apr 28, 2023 | AutoMLBenchmarking | —Unverified | 0 | 0 |
| OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | Oct 9, 2024 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| On Benchmarking Code LLMs for Android Malware Analysis | Apr 1, 2025 | BenchmarkingMalware Analysis | —Unverified | 0 | 0 |
| On Benchmarking Iris Recognition within a Head-mounted Display for AR/VR Application | Oct 20, 2020 | BenchmarkingIris Recognition | —Unverified | 0 | 0 |
| On Continual Model Refinement in Out-of-Distribution Data Streams | May 4, 2022 | BenchmarkingContinual Learning | —Unverified | 0 | 0 |
| Active Learning for Community Detection in Stochastic Block Models | May 8, 2016 | Active LearningBenchmarking | —Unverified | 0 | 0 |
| On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events | Dec 9, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos | Jan 1, 2024 | Benchmarking | —Unverified | 0 | 0 |
| On Distribution Grid Optimal Power Flow Development and Integration | Dec 9, 2022 | Benchmarking | —Unverified | 0 | 0 |
| ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities | Dec 9, 2024 | AllBenchmarking | —Unverified | 0 | 0 |
| One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision | Feb 3, 2021 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese | May 16, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| One of these (Few) Things is Not Like the Others | May 22, 2020 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios | Apr 16, 2025 | Audio Deepfake DetectionBenchmarking | —Unverified | 0 | 0 |
| One-Shot Federated Learning with Classifier-Free Diffusion Models | Feb 12, 2025 | BenchmarkingDataset Generation | —Unverified | 0 | 0 |
| On Evaluation of Bangla Word Analogies | Apr 10, 2023 | BenchmarkingWord Embeddings | —Unverified | 0 | 0 |
| On Evaluation of Document Classification using RVL-CDIP | Jun 21, 2023 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images | Dec 4, 2024 | BenchmarkingBuilding Damage Assessment | —Unverified | 0 | 0 |
| On General Language Understanding | Oct 27, 2023 | BenchmarkingEthics | —Unverified | 0 | 0 |
| Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis | Jul 1, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions | Aug 7, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 | 0 |
| Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots | Sep 12, 2024 | BenchmarkingChatbot | —Unverified | 0 | 0 |
| On loss functions and evaluation metrics for music source separation | Feb 16, 2022 | Audio Source SeparationBenchmarking | —Unverified | 0 | 0 |
| Only Time Can Tell: Discovering Temporal Data for Temporal Modeling | Jul 19, 2019 | BenchmarkingMotion Estimation | —Unverified | 0 | 0 |
| On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction | Jul 15, 2024 | Active LearningBenchmarking | —Unverified | 0 | 0 |
| An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems | Mar 12, 2024 | Benchmarking | —Unverified | 0 | 0 |
| On Neural Inertial Classification Networks for Pedestrian Activity Recognition | Feb 23, 2025 | Activity RecognitionBenchmarking | —Unverified | 0 | 0 |
| Zero-Forcing Max-Power Beamforming for Hybrid mmWave Full-Duplex MIMO Systems | Feb 29, 2020 | Benchmarking | —Unverified | 0 | 0 |
| LAraBench: Benchmarking Arabic AI with Large Language Models | May 24, 2023 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| On quantifying and improving realism of images generated with diffusion | Sep 26, 2023 | AttributeBenchmarking | —Unverified | 0 | 0 |
| Active Evaluation Acquisition for Efficient LLM Benchmarking | Oct 8, 2024 | Benchmarking | —Unverified | 0 | 0 |
| On Symbiosis of Attribute Prediction and Semantic Segmentation | Nov 23, 2019 | AttributeBenchmarking | —Unverified | 0 | 0 |
| On the Assessment of Benchmark Suites for Algorithm Comparison | Apr 15, 2021 | Benchmarking | —Unverified | 0 | 0 |
| On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation | Jul 4, 2024 | BenchmarkingChatbot | —Unverified | 0 | 0 |