SOTAVerified

Benchmarking

Papers

Showing 19711980 of 5548 papers

TitleStatusHype
Benchmarking Vision Language Models on German Factual Data0
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items0
Mamba-Based Ensemble learning for White Blood Cell ClassificationCode0
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR0
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts0
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography0
BoTTA: Benchmarking on-device Test Time Adaptation0
Benchmarking 3D Human Pose Estimation Models Under Occlusions0
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models0
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization0
Show:102550
← PrevPage 198 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified