SOTAVerified

Benchmarking

Papers

Showing 22912300 of 5548 papers

TitleStatusHype
A Universal Protocol to Benchmark Camera Calibration for Sports0
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic CountingCode0
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image SegmentationCode1
A Large-Scale Evaluation of Speech Foundation Models0
MMInA: Benchmarking Multihop Multimodal Internet Agents0
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming ProblemsCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
A Review and Efficient Implementation of Scene Graph Generation MetricsCode1
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptidesCode0
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via DiffusionCode1
Show:102550
← PrevPage 230 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified