SOTAVerified

Benchmarking

Papers

Showing 10611070 of 5548 papers

TitleStatusHype
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery AnalysisCode1
Continual Learning with Foundation Models: An Empirical Study of Latent ReplayCode1
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking PlatformCode1
Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networksCode1
From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image SegmentationCode1
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone NavigationCode1
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic ScenariosCode1
FM-TS: Flow Matching for Time Series GenerationCode1
FNBench: Benchmarking Robust Federated Learning against Noisy LabelsCode1
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMsCode1
Show:102550
← PrevPage 107 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified