SOTAVerified

Benchmarking

Papers

Showing 33113320 of 5548 papers

TitleStatusHype
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance0
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition0
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry BenchmarkingCode0
Editing Factual Knowledge and Explanatory Ability of Medical Large Language ModelsCode0
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies0
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky PatternsCode0
A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection0
Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems0
Show:102550
← PrevPage 332 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified