SOTAVerified

Benchmarking

Papers

Showing 36513675 of 5548 papers

TitleStatusHype
Alexpaca: Learning Factual Clarification Question Generation Without Examples0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech0
Benchmarking Foundation Models with Language-Model-as-an-Examiner0
Benchmarking Foundation Models for Zero-Shot Biometric Tasks0
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents0
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases0
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology0
Model Agnostic Explainable Selective Regression via Uncertainty Estimation0
Model-based trajectory stitching for improved behavioural cloning and its applications0
Model-Based Underwater 6D Pose Estimation from RGB0
Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model0
ModelHub.AI: Dissemination Platform for Deep Learning Models0
Model Lakes0
Modelling Neuronal Behaviour with Time Series Regression: Recurrent Neural Networks on C. Elegans Data0
Modelling neuronal behaviour with time series regression: Recurrent Neural Networks on synthetic C. elegans data0
Modelling Regional Solar Photovoltaic Capacity in Great Britain0
Model-predictive control and reinforcement learning in multi-energy system case studies0
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities0
Modern CNNs for IoT Based Farms0
Modern, Efficient, and Differentiable Transport Equation Models using JAX: Applications to Population Balance Equations0
Modified CMA-ES Algorithm for Multi-Modal Optimization: Incorporating Niching Strategies and Dynamic Adaptation Mechanism0
ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes0
MO-IOHinspector: Anytime Benchmarking of Multi-Objective Algorithms using IOHprofiler0
Show:102550
← PrevPage 147 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified