SOTAVerified

Benchmarking

Papers

Showing 15761600 of 5548 papers

TitleStatusHype
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AICode0
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 20230
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables0
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics0
HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation0
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning0
Sum Rate Maximization for Pinching Antennas Assisted RSMA System With Multiple Waveguides0
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs0
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution0
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents0
Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models0
GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments0
A Manually Annotated Image-Caption Dataset for Detecting Children in the WildCode0
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens0
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms0
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP0
Solving excited states for long-range interacting trapped ions with neural networks0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech0
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework0
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors0
Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting0
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine LearningCode0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding0
Show:102550
← PrevPage 64 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified