SOTAVerified

Benchmarking

Papers

Showing 27512800 of 5548 papers

TitleStatusHype
Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check0
BIAS: Transparent reporting of biomedical image analysis challenges0
Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey0
Genicious: Contextual Few-shot Prompting for Insights Discovery0
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking0
Beyond Uniform Lipschitz Condition in Differentially Private Optimization0
Writing as a testbed for open ended agents0
GenSpace: Benchmarking Spatially-Aware Image Generation0
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks0
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models0
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis0
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing0
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data0
Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy0
GeoGebra Tools with Proof Capabilities0
Language Models as a Service: Overview of a New Paradigm and its Challenges0
Geometric feature performance under downsampling for EEG classification tasks0
Geometry-Based Next Frame Prediction from Monocular Video0
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries0
GeoNet: Benchmarking Unsupervised Adaptation across Geographies0
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals0
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy0
Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages0
GFPNet: A Deep Network for Learning Shape Completion in Generic Fitted Primitives0
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News0
GiCCS: A German in-Context Conversational Similarity Benchmark0
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking0
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra0
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms0
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems0
The Benchmark Lottery0
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms0
Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods0
Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding0
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation0
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System0
A Benchmark for Multi-speaker Anonymization0
Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior0
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks0
GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks0
Goal-Driven Sequential Data Abstraction0
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation0
Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI0
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models0
GreenPCO: An Unsupervised Lightweight Point Cloud Odometry Method0
Ahead-of-Time P-Tuning0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
Show:102550
← PrevPage 56 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified