SOTAVerified

Benchmarking

Papers

Showing 10261050 of 5548 papers

TitleStatusHype
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone NavigationCode1
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery AnalysisCode1
Foundation Model of Electronic Medical Records for Adaptive Risk EstimationCode1
A skeletonization algorithm for gradient-based optimizationCode1
Benchmarking Visual Localization for Autonomous NavigationCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
A GPU-accelerated Large-scale Simulator for Transportation System Optimization BenchmarkingCode1
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingCode1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective OptimizationCode1
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language UnderstandingCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
Benchmarking Pathology Feature Extractors for Whole Slide Image ClassificationCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
FFB: A Fair Fairness Benchmark for In-Processing Group Fairness MethodsCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope PredictionCode1
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance ImagingCode1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise ModelsCode1
A global analysis of metrics used for measuring performance in natural language processingCode1
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution TracesCode1
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User DataCode1
Benchmarking: Past, Present and FutureCode1
FedCV: A Federated Learning Framework for Diverse Computer Vision TasksCode1
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial ImagesCode1
ArtFID: Quantitative Evaluation of Neural Style TransferCode1
Show:102550
← PrevPage 42 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified