| On Evaluation of Bangla Word Analogies | Apr 10, 2023 | BenchmarkingWord Embeddings | —Unverified | 0 |
| On Evaluation of Document Classification using RVL-CDIP | Jun 21, 2023 | BenchmarkingClassification | —Unverified | 0 |
| On General Language Understanding | Oct 27, 2023 | BenchmarkingEthics | —Unverified | 0 |
| Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions | Aug 7, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots | Sep 12, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| On loss functions and evaluation metrics for music source separation | Feb 16, 2022 | Audio Source SeparationBenchmarking | —Unverified | 0 |
| Only Time Can Tell: Discovering Temporal Data for Temporal Modeling | Jul 19, 2019 | BenchmarkingMotion Estimation | —Unverified | 0 |
| On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction | Jul 15, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems | Mar 12, 2024 | Benchmarking | —Unverified | 0 |
| On Neural Inertial Classification Networks for Pedestrian Activity Recognition | Feb 23, 2025 | Activity RecognitionBenchmarking | —Unverified | 0 |
| On quantifying and improving realism of images generated with diffusion | Sep 26, 2023 | AttributeBenchmarking | —Unverified | 0 |
| On Symbiosis of Attribute Prediction and Semantic Segmentation | Nov 23, 2019 | AttributeBenchmarking | —Unverified | 0 |
| On the Assessment of Benchmark Suites for Algorithm Comparison | Apr 15, 2021 | Benchmarking | —Unverified | 0 |
| On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation | Jul 4, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Decisions and Performance Under Bounded Rationality: A Computational Benchmarking Approach | May 26, 2020 | BenchmarkingDecision Making | —Unverified | 0 |
| On the Evaluation of Speech Foundation Models for Spoken Language Understanding | Jun 14, 2024 | BenchmarkingPrediction | —Unverified | 0 |
| On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel | Aug 1, 2022 | Benchmarkingimage-classification | —Unverified | 0 |
| On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks | Apr 29, 2024 | BenchmarkingFederated Learning | —Unverified | 0 |
| Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes | Mar 22, 2024 | Benchmarking | —Unverified | 0 |
| On the Interaction of Belief Bias and Explanations | Jun 29, 2021 | Benchmarking | —Unverified | 0 |
| On the Performance of Multimodal Language Models | Oct 4, 2023 | BenchmarkingBinary Classification | —Unverified | 0 |
| On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks | Apr 29, 2025 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| On the project risk baseline: integrating aleatory uncertainty into project scheduling | May 31, 2024 | BenchmarkingScheduling | —Unverified | 0 |
| On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild | Jul 17, 2023 | BenchmarkingReal-Time Semantic Segmentation | —Unverified | 0 |
| On the reduction of Linear Parameter-Varying State-Space models | Apr 2, 2024 | BenchmarkingDimensionality Reduction | —Unverified | 0 |
| On the relationship between Benchmarking, Standards and Certification in Robotics and AI | Sep 21, 2023 | Benchmarking | —Unverified | 0 |
| On the Reliability and Validity of Detecting Approval of Political Actors in Tweets | Nov 1, 2020 | BenchmarkingSentiment Analysis | —Unverified | 0 |
| On the Robustness of Human-Object Interaction Detection against Distribution Shift | Jun 22, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| On the role of benchmarking data sets and simulations in method comparison studies | Aug 2, 2022 | Benchmarking | —Unverified | 0 |
| Optimizer Benchmarking Needs to Account for Hyperparameter Tuning | Oct 25, 2019 | Benchmarking | —Unverified | 0 |
| On the Use of Quality Diversity Algorithms for The Traveling Thief Problem | Dec 16, 2021 | BenchmarkingDiversity | —Unverified | 0 |
| On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| On the Value of ML Models | Dec 13, 2021 | Benchmarking | —Unverified | 0 |
| OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images | Apr 17, 2023 | 3D Pose EstimationBenchmarking | —Unverified | 0 |
| OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | Dec 3, 2024 | BenchmarkingFace Recognition | —Unverified | 0 |
| OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking | May 15, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Open-CD: A Comprehensive Toolbox for Change Detection | Jul 22, 2024 | BenchmarkingChange Detection | —Unverified | 0 |
| OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI | Apr 4, 2023 | Benchmarking | —Unverified | 0 |
| Open Datasets for Satellite Radio Resource Control | Apr 22, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation | Apr 18, 2025 | Benchmarking | —Unverified | 0 |
| OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion | Jan 16, 2024 | Benchmarking | —Unverified | 0 |
| OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Mar 18, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation | Feb 25, 2025 | BenchmarkingSemantic Segmentation | —Unverified | 0 |
| Open foundation models for Azerbaijani language | Jul 2, 2024 | Benchmarking | —Unverified | 0 |
| Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs | Oct 16, 2024 | Benchmarking | —Unverified | 0 |
| Open Llama2 Model for the Lithuanian Language | Aug 23, 2024 | Benchmarkingmodel | —Unverified | 0 |
| OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning | Sep 11, 2022 | BenchmarkingClassification | —Unverified | 0 |
| Open-set object detection: towards unified problem formulation and benchmarking | Nov 8, 2024 | Autonomous DrivingBenchmarking | —Unverified | 0 |
| OpenSiteRec: An Open Dataset for Site Recommendation | Jul 3, 2023 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks | Jan 8, 2025 | BenchmarkingDeep Learning | —Unverified | 0 |