SOTAVerified

Benchmarking

Papers

Showing 821830 of 5548 papers

TitleStatusHype
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRACode1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and BeyondCode1
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language ModelsCode1
RetailSynth: Synthetic Data Generation for Retail AI Systems EvaluationCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
TAO-Amodal: A Benchmark for Tracking Any Object AmodallyCode1
How to Train Neural Field Representations: A Comprehensive Study and BenchmarkCode1
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language ModelsCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
Show:102550
← PrevPage 83 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified