| A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection | Jun 5, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 | 0 |
| Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems | Jul 26, 2024 | Benchmarking | —Unverified | 0 | 0 |
| MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts | Dec 18, 2023 | Benchmarking | —Unverified | 0 | 0 |
| MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts | Jun 18, 2023 | AutoMLBenchmarking | —Unverified | 0 | 0 |
| Towards an AI Accountability Policy | Jul 25, 2023 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Machine Generated Product Advertisements: Benchmarking LLMs Against Human Performance | Dec 27, 2024 | BenchmarkingPersuasiveness | —Unverified | 0 | 0 |
| Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations | Jul 17, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video | Mar 6, 2024 | BenchmarkingCrowd Counting | —Unverified | 0 | 0 |
| Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025) | May 17, 2025 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| Towards a Taxonomy of Graph Learning Datasets | Oct 27, 2021 | BenchmarkingGraph Learning | —Unverified | 0 | 0 |
| Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices | Jan 7, 2025 | BenchmarkingClustering | —Unverified | 0 | 0 |
| Machine learning for modelling unstructured grid data in computational physics: a review | Feb 13, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling (1+λ) EA Variants on OneMax and LeadingOnes | Aug 17, 2018 | BenchmarkingEvolutionary Algorithms | —Unverified | 0 | 0 |
| Machine Learning for Ranking f-wave Extraction Methods in Single-Lead ECGs | Jul 17, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving | Aug 19, 2024 | BenchmarkingMachine Translation | —Unverified | 0 | 0 |
| Uncertainty estimation of machine learning spatial precipitation predictions from satellite data | Nov 13, 2023 | BenchmarkingFeature Importance | —Unverified | 0 | 0 |
| Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction | Dec 12, 2024 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| Benchmarking LLMs and SLMs for patient reported outcomes | Dec 20, 2024 | BenchmarkingPrivacy Preserving | —Unverified | 0 | 0 |
| Benchmarking LLM powered Chatbots: Methods and Metrics | Aug 8, 2023 | BenchmarkingChatbot | —Unverified | 0 | 0 |
| Machine Vision based Sample-Tube Localization for Mars Sample Return | Mar 17, 2021 | BenchmarkingTemplate Matching | —Unverified | 0 | 0 |
| Benchmarking LLM Guardrails in Handling Multilingual Toxicity | Oct 29, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3 | Apr 22, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins | Apr 4, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios | Mar 31, 2025 | Adversarial AttackAutonomous Driving | —Unverified | 0 | 0 |
| Making Sense of Data in the Wild: Data Analysis Automation at Scale | Jan 27, 2025 | BenchmarkingDiversity | —Unverified | 0 | 0 |
| OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User | Oct 26, 2023 | Anomaly DetectionBenchmarking | —Unverified | 0 | 0 |
| A Deep Q-Learning Method for Downlink Power Allocation in Multi-Cell Networks | Apr 30, 2019 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages | Sep 1, 2024 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| Benchmarking LiDAR Sensors for Development and Evaluation of Automotive Perception | Apr 28, 2020 | BenchmarkingSystematic Literature Review | —Unverified | 0 | 0 |
| Towards Benchmarking and Evaluating Deepfake Detection | Mar 4, 2022 | BenchmarkingDeepFake Detection | —Unverified | 0 | 0 |
| ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation | May 14, 2025 | BenchmarkingDeformable Object Manipulation | —Unverified | 0 | 0 |
| MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Dec 6, 2024 | 2kAnomaly Detection | —Unverified | 0 | 0 |
| Deep Patent Landscaping Model Using Transformer and Graph Embedding | Mar 14, 2019 | BenchmarkingGraph Embedding | —Unverified | 0 | 0 |
| Manual Verbalizer Enrichment for Few-Shot Text Classification | Oct 8, 2024 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Towards Benchmarking Explainable Artificial Intelligence Methods | Aug 25, 2022 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 | 0 |
| Mapping global dynamics of benchmark creation and saturation in artificial intelligence | Mar 9, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions | Apr 17, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR | Nov 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Towards Benchmarking Scene Background Initialization | Jun 12, 2015 | Benchmarking | —Unverified | 0 | 0 |
| MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics | Mar 12, 2025 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Benchmarking Lexical Simplification Systems | May 1, 2016 | BenchmarkingLexical Simplification | —Unverified | 0 | 0 |
| Towards Benchmarking the Utility of Explanations for Model Debugging | May 10, 2021 | Benchmarking | —Unverified | 0 | 0 |
| WER We Stand: Benchmarking Urdu ASR Models | Sep 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Benchmarking Learnt Radio Localisation under Distribution Shift | Oct 4, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking learned non-Cartesian k-space trajectories and reconstruction networks | Jan 27, 2022 | Benchmarking | —Unverified | 0 | 0 |
| Match Stereo Videos via Bidirectional Alignment | Sep 30, 2024 | BenchmarkingStereo Matching | —Unverified | 0 | 0 |
| MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities | Aug 5, 2024 | BenchmarkingGraph Generation | —Unverified | 0 | 0 |
| PINNs for Medical Image Analysis: A Survey | Aug 2, 2024 | AnatomyBenchmarking | —Unverified | 0 | 0 |
| (N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model | Mar 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Benchmarking learned algorithms for computed tomography image reconstruction tasks | Dec 11, 2024 | BenchmarkingComputed Tomography (CT) | —Unverified | 0 | 0 |