SOTAVerified

Benchmarking

Papers

Showing 28312840 of 5548 papers

TitleStatusHype
A practical generalization metric for deep networks benchmarking0
Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages0
Accelerating the discovery of steady-states of planetary interior dynamics with machine learning0
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckListsCode0
Understanding the User: An Intent-Based Ranking Dataset0
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily ActivitiesCode0
Show:102550
← PrevPage 284 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified