SOTAVerified

Benchmarking

Papers

Showing 761770 of 5548 papers

TitleStatusHype
A multi-schematic classifier-independent oversampling approach for imbalanced datasetsCode1
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?Code1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical CyclonesCode1
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsCode1
AirSim Drone Racing LabCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
Benchmarking Large Multimodal Models against Common CorruptionsCode1
Show:102550
← PrevPage 77 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified