| Fairness Index Measures to Evaluate Bias in Biometric Recognition | Jun 19, 2023 | BenchmarkingFairness | —Unverified | 0 |
| FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning | May 12, 2025 | 16kBenchmarking | —Unverified | 0 |
| A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect | May 7, 2021 | BenchmarkingSpeech-to-Text | —Unverified | 0 |
| Benchmarking Active Learning Strategies for Materials Optimization and Discovery | Apr 12, 2022 | Active LearningBenchmarking | —Unverified | 0 |
| Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos | Oct 15, 2024 | BenchmarkingBlind Face Restoration | —Unverified | 0 |
| TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models | Jul 30, 2024 | BenchmarkingCode Completion | —Unverified | 0 |
| BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jun 25, 2025 | Artifact DetectionBenchmarking | —Unverified | 0 |
| Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms | Oct 6, 2023 | AutoMLBenchmarking | —Unverified | 0 |
| Benchmarking Active Learning for NILM | Nov 24, 2024 | Active LearningBenchmarking | —Unverified | 0 |
| Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation | Feb 21, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Analysing Features Learned Using Unsupervised Models on Program Embeddings | Jan 1, 2021 | BenchmarkingBinary Classification | —Unverified | 0 |
| Fairness-Aware Graph Neural Networks: A Survey | Jul 8, 2023 | BenchmarkingFairness | —Unverified | 0 |
| Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension | Nov 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 |
| FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation | Sep 29, 2021 | BenchmarkingImage Generation | —Unverified | 0 |
| Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data | Sep 17, 2018 | BenchmarkingSuper-Resolution | —Unverified | 0 |
| Analysing Errors of Open Information Extraction Systems | Jul 24, 2017 | BenchmarkingOpen Information Extraction | —Unverified | 0 |
| Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization | Apr 20, 2024 | Benchmarking | —Unverified | 0 |
| Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | May 7, 2024 | BenchmarkingModel Selection | —Unverified | 0 |
| Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles | Jan 13, 2025 | ArticlesBenchmarking | —Unverified | 0 |
| A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System | May 3, 2024 | BenchmarkingCollaborative Filtering | —Unverified | 0 |
| Benchmarking a Benchmark: How Reliable is MS-COCO? | Nov 5, 2023 | Benchmarkingimage-classification | —Unverified | 0 |
| A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management | Nov 29, 2017 | BenchmarkingDeep Reinforcement Learning | —Unverified | 0 |
| Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents | May 30, 2025 | BenchmarkingCode Repair | —Unverified | 0 |
| A new pathway to generative artificial intelligence by minimizing the maximum entropy | Feb 18, 2025 | Benchmarking | —Unverified | 0 |
| Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations | Dec 23, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| FAIRification of MLC data | Nov 23, 2022 | BenchmarkingManagement | —Unverified | 0 |
| BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions | May 17, 2024 | BenchmarkingPrognosis | —Unverified | 0 |
| Adaptive Gradient Methods with Local Guarantees | Mar 2, 2022 | Benchmarking | —Unverified | 0 |
| Object Pose Estimation in Robotics Revisited | Jun 6, 2019 | 3D Pose Estimation6D Pose Estimation | —Unverified | 0 |
| BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization | Aug 27, 2024 | 3D Object DetectionBenchmarking | —Unverified | 0 |
| Scale MLPerf-0.6 models on Google TPU-v3 Pods | Sep 21, 2019 | Benchmarking | —Unverified | 0 |
| FACT: Learning Governing Abstractions Behind Integer Sequences | Sep 20, 2022 | Benchmarking | —Unverified | 0 |
| Boundary Detection Benchmarking: Beyond F-Measures | Jun 1, 2013 | BenchmarkingBoundary Detection | —Unverified | 0 |
| BoTTA: Benchmarking on-device Test Time Adaptation | Apr 14, 2025 | BenchmarkingTest-time Adaptation | —Unverified | 0 |
| Benchmarking 3D multi-coil NC-PDNet MRI reconstruction | Nov 8, 2024 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| Boost Vision Transformer with GPU-Friendly Sparsity and Quantization | May 18, 2023 | BenchmarkingGPU | —Unverified | 0 |
| Benchmarking 3D Human Pose Estimation Models Under Occlusions | Apr 14, 2025 | 3D Human Pose EstimationBenchmarking | —Unverified | 0 |
| An AI based talent acquisition and benchmarking for job | Aug 12, 2020 | BenchmarkingCultural Vocal Bursts Intensity Prediction | —Unverified | 0 |
| FactLens: Benchmarking Fine-Grained Fact Verification | Nov 8, 2024 | BenchmarkingFact Verification | —Unverified | 0 |
| BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models | May 3, 2025 | BenchmarkingHyperparameter Optimization | —Unverified | 0 |
| Benchmarking 2D Egocentric Hand Pose Datasets | Sep 11, 2024 | Activity RecognitionBenchmarking | —Unverified | 0 |
| BongLLaMA: LLaMA for Bangla Language | Oct 28, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model | Mar 28, 2025 | Algorithmic TradingBenchmarking | —Unverified | 0 |
| Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches | Mar 21, 2023 | BenchmarkingThompson Sampling | —Unverified | 0 |
| Benchmark for Antibody Binding Affinity Maturation and Design | May 23, 2025 | Benchmarking | —Unverified | 0 |
| ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content | Mar 13, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| BOLD: Boolean Logic Deep Learning | May 25, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| An Accelerated Correlation Filter Tracker | Dec 5, 2019 | BenchmarkingObject Tracking | —Unverified | 0 |
| Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning | Apr 19, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Face Detection on Surveillance Images | Oct 22, 2019 | BenchmarkingFace Detection | —Unverified | 0 |