SOTAVerified

Benchmarking

Papers

Showing 12311240 of 5548 papers

TitleStatusHype
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate CampaignsCode1
Light Field Salient Object Detection: A Review and BenchmarkCode1
Benchmarking: Past, Present and FutureCode1
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World AnomaliesCode1
LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property PredictionCode1
GuacaMol: Benchmarking Models for De Novo Molecular DesignCode1
Benchmarking Self-Supervised Learning on Diverse Pathology DatasetsCode1
Show:102550
← PrevPage 124 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified