| FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding | Nov 16, 2021 | BenchmarkingNatural Language Understanding | —Unverified | 0 | 0 |
| Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation | Jul 18, 2020 | Anomaly DetectionBenchmarking | —Unverified | 0 | 0 |
| Few-Shot Learning for Industrial Time Series: A Comparative Analysis Using the Example of Screw-Fastening Process Monitoring | Jun 16, 2025 | BenchmarkingFew-Shot Learning | —Unverified | 0 | 0 |
| Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift | Jul 12, 2025 | BenchmarkingTransfer Learning | —Unverified | 0 | 0 |
| AI PERSONA: Towards Life-long Personalization of LLMs | Dec 17, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps | Mar 15, 2022 | BenchmarkingSentiment Analysis | —Unverified | 0 | 0 |
| E(3)-equivariant models cannot learn chirality: Field-based molecular generation | Feb 24, 2024 | BenchmarkingGraph Neural Network | —Unverified | 0 | 0 |
| CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations | Oct 2, 2024 | BenchmarkingLong Form Question Answering | —Unverified | 0 | 0 |
| Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark | Nov 23, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Finance Language Model Evaluation (FLaME) | Jun 18, 2025 | BenchmarkingLanguage Model Evaluation | —Unverified | 0 | 0 |
| CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods | Oct 10, 2023 | BenchmarkingPrediction | —Unverified | 0 | 0 |
| Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging | Jun 6, 2023 | BenchmarkingSentence | —Unverified | 0 | 0 |
| Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada | Apr 1, 2021 | BenchmarkingLanguage Identification | —Unverified | 0 | 0 |
| TEP-GNN: Accurate Execution Time Prediction of Functional Tests using Graph Neural Networks | Aug 25, 2022 | BenchmarkingGraph Neural Network | —Unverified | 0 | 0 |
| Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art | May 20, 2016 | BenchmarkingGeneral Classification | —Unverified | 0 | 0 |
| Terabyte-scale supervised 3D training and benchmarking dataset of the mouse kidney | Aug 4, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure | Oct 16, 2016 | BenchmarkingText Categorization | —Unverified | 0 | 0 |
| Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering | Sep 13, 2024 | BenchmarkingBinary Classification | —Unverified | 0 | 0 |
| FineText: Text Classification via Attention-based Language Model Fine-tuning | Oct 25, 2019 | BenchmarkingClassification | —Unverified | 0 | 0 |
| Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs | May 24, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency | Jan 30, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | Oct 7, 2023 | Benchmarkingnamed-entity-recognition | —Unverified | 0 | 0 |
| FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets | May 26, 2025 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example | Jul 4, 2020 | BenchmarkingPosition | —Unverified | 0 | 0 |
| FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance | Mar 7, 2025 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking | Apr 2, 2025 | 3D Scene ReconstructionBenchmarking | —Unverified | 0 | 0 |
| AI Matrix - Synthetic Benchmarks for DNN | Nov 27, 2018 | BenchmarkingCPU | —Unverified | 0 | 0 |
| FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures | Jan 1, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 | 0 |
| Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example | Oct 11, 2024 | BenchmarkingCode Generation | —Unverified | 0 | 0 |
| FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization | Jun 25, 2025 | BenchmarkingContrastive Learning | —Unverified | 0 | 0 |
| Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP) | Oct 14, 2024 | BenchmarkingMulti-Task Learning | —Unverified | 0 | 0 |
| FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems | Jun 8, 2023 | BenchmarkingEdge-computing | —Unverified | 0 | 0 |
| Tetrad: Actively Secure 4PC for Secure Training and Inference | Jun 5, 2021 | BenchmarkingFairness | —Unverified | 0 | 0 |
| FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning | Jan 1, 2024 | BenchmarkingFederated Learning | —Unverified | 0 | 0 |
| FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents | Jun 21, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | Feb 18, 2025 | Benchmarking | —Unverified | 0 | 0 |
| FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models | Jun 3, 2025 | BenchmarkingDomain Adaptation | —Unverified | 0 | 0 |
| FlowMind: Automatic Workflow Generation with LLMs | Mar 17, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| AI Idea Bench 2025: AI Research Idea Generation Benchmark | Apr 19, 2025 | Benchmarkingscientific discovery | —Unverified | 0 | 0 |
| Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text | Nov 1, 2019 | BenchmarkingDe-identification | —Unverified | 0 | 0 |
| A Benchmark for Out of Distribution Detection in Point Cloud 3D Semantic Segmentation | Nov 11, 2022 | 3D Semantic SegmentationAutonomous Driving | —Unverified | 0 | 0 |
| Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy | Jul 26, 2023 | Benchmarkingobject-detection | —Unverified | 0 | 0 |
| FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks | Oct 1, 2024 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Building a continuous benchmarking ecosystem in bioinformatics | Sep 23, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints | Aug 9, 2023 | Benchmarking | —Unverified | 0 | 0 |
| BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer | May 24, 2023 | BenchmarkingCross-Lingual Transfer | —Unverified | 0 | 0 |
| BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes | Nov 11, 2024 | BenchmarkingMulti-Object Tracking | —Unverified | 0 | 0 |
| uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks | Oct 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |