SOTAVerified

Benchmarking

Papers

Showing 26912700 of 5548 papers

TitleStatusHype
QGym: Scalable Simulation and Benchmarking of Queuing Network ControllersCode0
Named Clinical Entity Recognition BenchmarkCode0
Precise Model Benchmarking with Only a Few Observations0
Rule-based Data Selection for Large Language Models0
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsCode0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems0
Adjusting Pretrained Backbones for PerformativityCode0
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection0
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends0
Show:102550
← PrevPage 270 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified