SOTAVerified

Benchmarking

Papers

Showing 33013325 of 5548 papers

TitleStatusHype
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering0
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground0
Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost BenchmarksCode0
Classification of the Fashion-MNIST Dataset on a Quantum Computer0
Model Lakes0
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verificationCode0
A Bayesian Committee Machine Potential for Oxygen-containing Organic Compounds0
SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study0
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms0
Imitation Learning Datasets: A Toolkit For Creating Datasets, Training Agents and Benchmarking0
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance0
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition0
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry BenchmarkingCode0
Editing Factual Knowledge and Explanatory Ability of Medical Large Language ModelsCode0
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies0
The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky PatternsCode0
A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection0
Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems0
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset0
Benchmarking LLMs on the Semantic Overlap Summarization Task0
Partial Rankings of OptimizersCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
E(3)-equivariant models cannot learn chirality: Field-based molecular generation0
Show:102550
← PrevPage 133 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified