SOTAVerified

Benchmarking

Papers

Showing 25812590 of 5548 papers

TitleStatusHype
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray InterpretationCode3
Benchmarking Large Multimodal Models against Common CorruptionsCode1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound0
Data Augmentation for Traffic Classification0
R-Judge: Benchmarking Safety Risk Awareness for LLM AgentsCode2
Harnessing Orthogonality to Train Low-Rank Neural NetworksCode0
WAVES: Benchmarking the Robustness of Image WatermarksCode2
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription0
Large Language Models are Null-Shot Learners0
Show:102550
← PrevPage 259 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified