SOTAVerified

Benchmarking

Papers

Showing 23512375 of 5548 papers

TitleStatusHype
A Look at the Evaluation Setup of the M5 Forecasting Competition0
From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems0
Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data0
A Comprehensive Survey on Retrieval Methods in Recommender Systems0
ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments0
Benchmarking unsupervised near-duplicate image detection0
Abasy Atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization0
FRED: The Florence RGB-Event Drone Dataset0
Benchmarking Unsupervised Anomaly Detection and Localization0
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning0
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs0
Benchmarking Uncertainty Quantification on Biosignal Classification Tasks under Dataset Shift0
Automatic vehicle trajectory data reconstruction at scale0
ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments0
Benchmarking Ultra-Low-Power μNPUs0
Automatic Target Recognition on Synthetic Aperture Radar Imagery: A Survey0
Benchmarking Ultra-High-Definition Image Super-Resolution0
Almost Equivariance via Lie Algebra Convolutions0
Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities0
Benchmarking Twitter Sentiment Analysis Tools0
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models0
Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives0
Benchmarking Transformers-based models on French Spoken Language Understanding tasks0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance0
Show:102550
← PrevPage 95 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified