Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5501–5548 of 5548 papers

Title	Date	Tasks	Status
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding	Nov 16, 2021	BenchmarkingNatural Language Understanding	—Unverified
Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation	Jul 18, 2020	Anomaly DetectionBenchmarking	—Unverified
Few-Shot Learning for Industrial Time Series: A Comparative Analysis Using the Example of Screw-Fastening Process Monitoring	Jun 16, 2025	BenchmarkingFew-Shot Learning	—Unverified
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift	Jul 12, 2025	BenchmarkingTransfer Learning	—Unverified
AI PERSONA: Towards Life-long Personalization of LLMs	Dec 17, 2024	Benchmarking	—Unverified
Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps	Mar 15, 2022	BenchmarkingSentiment Analysis	—Unverified
E(3)-equivariant models cannot learn chirality: Field-based molecular generation	Feb 24, 2024	BenchmarkingGraph Neural Network	—Unverified
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations	Oct 2, 2024	BenchmarkingLong Form Question Answering	—Unverified
Filter Methods for Feature Selection in Supervised Machine Learning Applications -- Review and Benchmark	Nov 23, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
Finance Language Model Evaluation (FLaME)	Jun 18, 2025	BenchmarkingLanguage Model Evaluation	—Unverified
CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods	Oct 10, 2023	BenchmarkingPrediction	—Unverified
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging	Jun 6, 2023	BenchmarkingSentence	—Unverified
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada	Apr 1, 2021	BenchmarkingLanguage Identification	—Unverified
TEP-GNN: Accurate Execution Time Prediction of Functional Tests using Graph Neural Networks	Aug 25, 2022	BenchmarkingGraph Neural Network	—Unverified
Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art	May 20, 2016	BenchmarkingGeneral Classification	—Unverified
Terabyte-scale supervised 3D training and benchmarking dataset of the mouse kidney	Aug 4, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure	Oct 16, 2016	BenchmarkingText Categorization	—Unverified
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering	Sep 13, 2024	BenchmarkingBinary Classification	—Unverified
FineText: Text Classification via Attention-based Language Model Fine-tuning	Oct 25, 2019	BenchmarkingClassification	—Unverified
Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs	May 24, 2025	Benchmarking	—Unverified
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Jan 30, 2025	BenchmarkingLanguage Modeling	—Unverified
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets	Oct 7, 2023	Benchmarkingnamed-entity-recognition	—Unverified
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets	May 26, 2025	BenchmarkingGPU	—Unverified
Building benchmarking frameworks for supporting replicability and reproducibility: spatial and textual analysis as an example	Jul 4, 2020	BenchmarkingPosition	—Unverified
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance	Mar 7, 2025	ArticlesBenchmarking	—Unverified
FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking	Apr 2, 2025	3D Scene ReconstructionBenchmarking	—Unverified
AI Matrix - Synthetic Benchmarks for DNN	Nov 27, 2018	BenchmarkingCPU	—Unverified
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures	Jan 1, 2024	BenchmarkingInstance Segmentation	—Unverified
Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example	Oct 11, 2024	BenchmarkingCode Generation	—Unverified
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization	Jun 25, 2025	BenchmarkingContrastive Learning	—Unverified
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)	Oct 14, 2024	BenchmarkingMulti-Task Learning	—Unverified
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems	Jun 8, 2023	BenchmarkingEdge-computing	—Unverified
Tetrad: Actively Secure 4PC for Secure Training and Inference	Jun 5, 2021	BenchmarkingFairness	—Unverified
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning	Jan 1, 2024	BenchmarkingFederated Learning	—Unverified
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents	Jun 21, 2024	Benchmarking	—Unverified
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Feb 18, 2025	Benchmarking	—Unverified
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models	Jun 3, 2025	BenchmarkingDomain Adaptation	—Unverified
FlowMind: Automatic Workflow Generation with LLMs	Mar 17, 2024	BenchmarkingQuestion Answering	—Unverified
AI Idea Bench 2025: AI Research Idea Generation Benchmark	Apr 19, 2025	Benchmarkingscientific discovery	—Unverified
Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text	Nov 1, 2019	BenchmarkingDe-identification	—Unverified
A Benchmark for Out of Distribution Detection in Point Cloud 3D Semantic Segmentation	Nov 11, 2022	3D Semantic SegmentationAutonomous Driving	—Unverified
Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy	Jul 26, 2023	Benchmarkingobject-detection	—Unverified
FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks	Oct 1, 2024	BenchmarkingFairness	—Unverified
Building a continuous benchmarking ecosystem in bioinformatics	Sep 23, 2024	Benchmarking	—Unverified
Enhancing Architecture Frameworks by Including Modern Stakeholders and their Views/Viewpoints	Aug 9, 2023	Benchmarking	—Unverified
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer	May 24, 2023	BenchmarkingCross-Lingual Transfer	—Unverified
BuckTales : A multi-UAV dataset for multi-object tracking and re-identification of wild antelopes	Nov 11, 2024	BenchmarkingMulti-Object Tracking	—Unverified
uto\!L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks	Oct 11, 2024	BenchmarkingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 111 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified