| Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning | Nov 4, 2022 | BenchmarkingDiversity | —Unverified | 0 |
| Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms | Nov 17, 2021 | Benchmarking | —Unverified | 0 |
| Foundations for learning from noisy quantum experiments | Apr 28, 2022 | Benchmarking | —Unverified | 0 |
| Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate | May 28, 2025 | Benchmarking | —Unverified | 0 |
| FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph | May 4, 2020 | BenchmarkingEntity Linking | —Unverified | 0 |
| Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension | May 1, 2022 | BenchmarkingQuestion Answering | —Unverified | 0 |
| AI PERSONA: Towards Life-long Personalization of LLMs | Dec 17, 2024 | Benchmarking | —Unverified | 0 |
| Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension | Nov 16, 2021 | BenchmarkingQuestion Answering | —Unverified | 0 |
| FRED: The Florence RGB-Event Drone Dataset | Jun 5, 2025 | BenchmarkingTrajectory Forecasting | —Unverified | 0 |
| Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models | Feb 9, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification | May 24, 2024 | BenchmarkingData Augmentation | —Unverified | 0 |
| Benchmarking Single-Image Reflection Removal Algorithms | Oct 1, 2017 | BenchmarkingReflection Removal | —Unverified | 0 |
| FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning | May 12, 2025 | 16kBenchmarking | —Unverified | 0 |
| From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano | Jul 5, 2024 | AttributeBenchmarking | —Unverified | 0 |
| Benchmarking projective simulation in navigation problems | Apr 23, 2018 | BenchmarkingQ-Learning | —Unverified | 0 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| A Survey on LLM-based News Recommender Systems | Feb 13, 2025 | BenchmarkingFairness | —Unverified | 0 |
| From Code to Play: Benchmarking Program Search for Games Using Large Language Models | Dec 5, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks | Apr 14, 2022 | Adversarial AttackAdversarial Robustness | —Unverified | 0 |
| From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | May 17, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Holistic Multi-View Building Analysis in the Wild with Projection Pooling | Aug 23, 2020 | Benchmarking | —Unverified | 0 |
| How Aligned are Different Alignment Metrics? | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images | Feb 27, 2024 | BenchmarkingDefect Detection | —Unverified | 0 |
| From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | Aug 5, 2024 | BenchmarkingCode Generation | —Unverified | 0 |
| Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms | Sep 11, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |