SOTAVerified

Benchmarking

Papers

Showing 31513175 of 5548 papers

TitleStatusHype
Is margin all you need? An extensive empirical study of active learning on tabular data0
Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
Benchmarking real-time algorithms for in-phase auditory stimulation of low amplitude slow waves with wearable EEG devices during sleep0
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?0
Is Self-Supervision Enough? Benchmarking Foundation Models Against End-to-End Training for Mitotic Figure Classification0
Is Single-View Mesh Reconstruction Ready for Robotics?0
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images0
Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?0
Is Transfer Learning Necessary for Protein Landscape Prediction?0
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?0
Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models0
The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence0
A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation0
Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review0
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes0
Iterated Invariant Extended Kalman Filter (IterIEKF)0
Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines0
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives0
"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning0
iWarded: A System for Benchmarking Datalog+/- Reasoning (technical report)0
IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays0
Jailbreak Distillation: Renewable Safety Benchmarking0
The Unconstrained Ear Recognition Challenge0
Show:102550
← PrevPage 127 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified