SOTAVerified

Benchmarking

Papers

Showing 33813390 of 5548 papers

TitleStatusHype
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset0
SAM-based instance segmentation models for the automation of structural damage detection0
Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis0
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs0
Automated legal reasoning with discretion to act using s(LAW)0
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images0
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding0
Benchmarking the Fairness of Image Upsampling MethodsCode0
Show:102550
← PrevPage 339 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified