SOTAVerified

Benchmarking

Papers

Showing 27512775 of 5548 papers

TitleStatusHype
Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check0
BIAS: Transparent reporting of biomedical image analysis challenges0
Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey0
Genicious: Contextual Few-shot Prompting for Insights Discovery0
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking0
Beyond Uniform Lipschitz Condition in Differentially Private Optimization0
Writing as a testbed for open ended agents0
GenSpace: Benchmarking Spatially-Aware Image Generation0
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks0
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models0
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis0
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing0
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data0
Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy0
GeoGebra Tools with Proof Capabilities0
Language Models as a Service: Overview of a New Paradigm and its Challenges0
Geometric feature performance under downsampling for EEG classification tasks0
Geometry-Based Next Frame Prediction from Monocular Video0
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries0
GeoNet: Benchmarking Unsupervised Adaptation across Geographies0
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals0
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy0
Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages0
GFPNet: A Deep Network for Learning Shape Completion in Generic Fitted Primitives0
Show:102550
← PrevPage 111 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified