Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1026–1050 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation	May 27, 2025	BenchmarkingDecision Making	CodeCode Available	1	5
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis	Mar 9, 2021	BenchmarkingClassification	CodeCode Available	1	5
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation	Feb 10, 2025	Benchmarking	CodeCode Available	1	5
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchmarking Visual Localization for Autonomous Navigation	Mar 24, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1	5
FiFAR: A Fraud Detection Dataset for Learning to Defer	Dec 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking	Jun 15, 2024	BenchmarkingGPU	CodeCode Available	1	5
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging	Jun 6, 2025	Benchmarking	CodeCode Available	1	5
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization	Aug 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding	Sep 27, 2021	BenchmarkingNatural Language Understanding	CodeCode Available	1	5
Benchmarking emergency department triage prediction models with machine learning and large public electronic health records	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Pathology Feature Extractors for Whole Slide Image Classification	Nov 20, 2023	Benchmarkingimage-classification	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods	Jun 15, 2023	BenchmarkingFairness	CodeCode Available	1	5
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1	5
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction	Jul 25, 2024	BenchmarkingDeep Learning	CodeCode Available	1	5
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging	Apr 26, 2020	BenchmarkingLeft Atrium Segmentation	CodeCode Available	1	5
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models	Mar 31, 2023	BenchmarkingCausal Discovery	CodeCode Available	1	5
A global analysis of metrics used for measuring performance in natural language processing	Apr 25, 2022	BenchmarkingMachine Translation	CodeCode Available	1	5
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces	May 23, 2023	Benchmarking	CodeCode Available	1	5
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data	Mar 7, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking: Past, Present and Future	Aug 1, 2021	BenchmarkingReading Comprehension	CodeCode Available	1	5
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks	Nov 22, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images	Oct 25, 2022	BenchmarkingFew-Shot Object Detection	CodeCode Available	1	5
ArtFID: Quantitative Evaluation of Neural Style Transfer	Jul 25, 2022	BenchmarkingMeta-Learning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 42 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified