| Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension | May 1, 2022 | BenchmarkingQuestion Answering | —Unverified | 0 |
| AI PERSONA: Towards Life-long Personalization of LLMs | Dec 17, 2024 | Benchmarking | —Unverified | 0 |
| Foundations for learning from noisy quantum experiments | Apr 28, 2022 | Benchmarking | —Unverified | 0 |
| Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate | May 28, 2025 | Benchmarking | —Unverified | 0 |
| Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension | Nov 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models | Feb 9, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning | May 12, 2025 | 16kBenchmarking | —Unverified | 0 |
| Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization | Jun 16, 2023 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| FRED: The Florence RGB-Event Drone Dataset | Jun 5, 2025 | BenchmarkingTrajectory Forecasting | —Unverified | 0 |
| Benchmarking projective simulation in navigation problems | Apr 23, 2018 | BenchmarkingQ-Learning | —Unverified | 0 |
| Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification | May 24, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| Benchmarking Single-Image Reflection Removal Algorithms | Oct 1, 2017 | BenchmarkingReflection Removal | —Unverified | 0 |
| A Survey on LLM-based News Recommender Systems | Feb 13, 2025 | BenchmarkingFairness | —Unverified | 0 |
| How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study | Dec 25, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Human Body Shape Classification Based on a Single Image | May 29, 2023 | BenchmarkingClassification | —Unverified | 0 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Benchmarking SMT Performance for Farsi Using the TEP++ Corpus | May 1, 2015 | BenchmarkingMachine Translation | —Unverified | 0 |
| From Code to Play: Benchmarking Program Search for Games Using Large Language Models | Dec 5, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks | Apr 14, 2022 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | May 17, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms | Sep 11, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections | Nov 27, 2023 | ArticlesBenchmarking | —Unverified | 0 |
| A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images | Feb 27, 2024 | BenchmarkingDefect Detection | —Unverified | 0 |
| From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | Aug 5, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| How Good is a Video Summary? A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization | Jan 26, 2021 | BenchmarkingSupervised Video Summarization | —Unverified | 0 |
| Benchmarking Spiking Neural Network Learning Methods with Varying Locality | Feb 1, 2024 | Benchmarking | —Unverified | 0 |
| Fairness Index Measures to Evaluate Bias in Biometric Recognition | Jun 19, 2023 | BenchmarkingFairness | —Unverified | 0 |
| Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making | Jun 25, 2024 | BenchmarkingDecision Making | —Unverified | 0 |
| Fairness-Aware Graph Neural Networks: A Survey | Jul 8, 2023 | BenchmarkingFairness | —Unverified | 0 |
| From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution | Apr 9, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking State-of-the-Art Deep Learning Software Tools | Aug 25, 2016 | BenchmarkingCPU | —Unverified | 0 |
| From Sound Representation to Model Robustness | Jul 27, 2020 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs | Oct 25, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Benchmarking state-of-the-art gradient boosting algorithms for classification | May 26, 2023 | Bayesian OptimizationBenchmarking | —Unverified | 0 |
| Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images | Dec 12, 2023 | BenchmarkingRetrieval | —Unverified | 0 |
| FSD-10: A Dataset for Competitive Sports Content Analysis | Feb 9, 2020 | Action RecognitionBenchmarking | —Unverified | 0 |
| FAIRification of MLC data | Nov 23, 2022 | BenchmarkingManagement | —Unverified | 0 |
| A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking | Sep 5, 2023 | BenchmarkingKnowledge Distillation | —Unverified | 0 |
| How Good Is Neural Combinatorial Optimization? A Systematic Evaluation on the Traveling Salesman Problem | Sep 22, 2022 | BenchmarkingCombinatorial Optimization | —Unverified | 0 |
| Full-scale modal testing of a Hawk T1A aircraft for benchmarking vibration-based methods | Oct 6, 2023 | BenchmarkingExperimental Design | —Unverified | 0 |
| Full-stack evaluation of Machine Learning inference workloads for RISC-V systems | May 24, 2024 | BenchmarkingDeep Learning | —Unverified | 0 |
| A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models | Feb 21, 2024 | BenchmarkingImage to text | —Unverified | 0 |
| FunBench: Benchmarking Fundus Reading Skills of MLLMs | Mar 2, 2025 | AnatomyBenchmarking | —Unverified | 0 |
| Functional Code Building Genetic Programming | Jun 9, 2022 | BenchmarkingProgram Synthesis | —Unverified | 0 |
| Efficient Pauli channel estimation with logarithmic quantum memory | Sep 25, 2023 | Benchmarking | —Unverified | 0 |
| A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System | May 3, 2024 | BenchmarkingCollaborative Filtering | —Unverified | 0 |
| FuzzWiz -- Fuzzing Framework for Efficient Hardware Coverage | Oct 23, 2024 | Benchmarking | —Unverified | 0 |
| Fuzzy Knowledge Distillation from High-Order TSK to Low-Order TSK | Feb 16, 2023 | BenchmarkingKnowledge Distillation | —Unverified | 0 |
| A Survey of Spanish Clinical Language Models | Aug 4, 2023 | BenchmarkingSurvey | —Unverified | 0 |
| AI Matrix - Synthetic Benchmarks for DNN | Nov 27, 2018 | BenchmarkingCPU | —Unverified | 0 |