SOTAVerified

Benchmarking

Papers

Showing 39013925 of 5548 papers

TitleStatusHype
Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors0
Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks0
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection0
Off-policy Evaluation for Payments at Adyen0
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation0
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications0
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics0
IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model0
Benchmarking Azerbaijani Neural Machine Translation0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver0
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking0
Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims0
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics0
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics0
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions0
Benchmarking Automated Review Response Generation for the Hospitality Domain0
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications0
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB0
On Benchmarking Code LLMs for Android Malware Analysis0
On Benchmarking Iris Recognition within a Head-mounted Display for AR/VR Application0
On Continual Model Refinement in Out-of-Distribution Data Streams0
Active Learning for Community Detection in Stochastic Block Models0
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events0
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos0
On Distribution Grid Optimal Power Flow Development and Integration0
Show:102550
← PrevPage 157 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified