| Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances | Aug 3, 2023 | Benchmarking | —Unverified | 0 |
| Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark | Jun 4, 2018 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume | Mar 8, 2024 | Adversarial RobustnessBenchmarking | —Unverified | 0 |
| Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion | May 28, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect | May 7, 2021 | BenchmarkingSpeech-to-Text | —Unverified | 0 |
| Benchmarking Active Learning Strategies for Materials Optimization and Discovery | Apr 12, 2022 | Active LearningBenchmarking | —Unverified | 0 |
| Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos | Oct 15, 2024 | BenchmarkingBlind Face Restoration | —Unverified | 0 |
| TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models | Jul 30, 2024 | BenchmarkingCode Completion | —Unverified | 0 |
| BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jun 25, 2025 | Artifact DetectionBenchmarking | —Unverified | 0 |
| Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms | Oct 6, 2023 | AutoMLBenchmarking | —Unverified | 0 |
| Benchmarking Active Learning for NILM | Nov 24, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation | Feb 21, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Analysing Features Learned Using Unsupervised Models on Program Embeddings | Jan 1, 2021 | BenchmarkingBinary Classification | —Unverified | 0 |
| ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content | Mar 13, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data | Sep 17, 2018 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| Analysing Errors of Open Information Extraction Systems | Jul 24, 2017 | BenchmarkingOpen Information Extraction | —Unverified | 0 |
| Exploring Capabilities of Time Series Foundation Models in Building Analytics | Oct 28, 2024 | Benchmarkingenergy management | —Unverified | 0 |
| Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization | Apr 20, 2024 | Benchmarking | —Unverified | 0 |
| Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | May 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles | Jan 13, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| Exploring Continual Learning of Diffusion Models | Mar 27, 2023 | BenchmarkingContinual Learning | —Unverified | 0 |
| Benchmarking a Benchmark: How Reliable is MS-COCO? | Nov 5, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management | Nov 29, 2017 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents | May 30, 2025 | BenchmarkingCode Repair | —Unverified | 0 |
| A new pathway to generative artificial intelligence by minimizing the maximum entropy | Feb 18, 2025 | Benchmarking | —Unverified | 0 |