SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1061–1070 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis	Mar 9, 2021	BenchmarkingClassification	CodeCode Available	1	5
Continual Learning with Foundation Models: An Empirical Study of Latent Replay	Apr 30, 2022	BenchmarkingContinual Learning	CodeCode Available	1	5
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform	Jul 15, 2020	ArticlesBenchmarking	CodeCode Available	1	5
Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks	Dec 30, 2021	BenchmarkingHeterogeneous Node Classification	CodeCode Available	1	5
From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation	Mar 3, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1	5
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation	May 27, 2025	BenchmarkingDecision Making	CodeCode Available	1	5
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	1	5
FM-TS: Flow Matching for Time Series Generation	Nov 12, 2024	BenchmarkingImputation	CodeCode Available	1	5
FNBench: Benchmarking Robust Federated Learning against Noisy Labels	May 10, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs	Nov 29, 2023	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 107 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified