SOTAVerified

Benchmarking

Papers

Showing 39013950 of 5548 papers

TitleStatusHype
Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors0
Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks0
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection0
Off-policy Evaluation for Payments at Adyen0
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation0
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications0
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics0
IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model0
Benchmarking Azerbaijani Neural Machine Translation0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver0
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking0
Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims0
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics0
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics0
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions0
Benchmarking Automated Review Response Generation for the Hospitality Domain0
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications0
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB0
On Benchmarking Code LLMs for Android Malware Analysis0
On Benchmarking Iris Recognition within a Head-mounted Display for AR/VR Application0
On Continual Model Refinement in Out-of-Distribution Data Streams0
Active Learning for Community Detection in Stochastic Block Models0
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events0
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos0
On Distribution Grid Optimal Power Flow Development and Integration0
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities0
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision0
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese0
One of these (Few) Things is Not Like the Others0
Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios0
One-Shot Federated Learning with Classifier-Free Diffusion Models0
On Evaluation of Bangla Word Analogies0
On Evaluation of Document Classification using RVL-CDIP0
Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images0
On General Language Understanding0
Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis0
Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions0
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots0
On loss functions and evaluation metrics for music source separation0
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling0
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction0
An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems0
On Neural Inertial Classification Networks for Pedestrian Activity Recognition0
Zero-Forcing Max-Power Beamforming for Hybrid mmWave Full-Duplex MIMO Systems0
LAraBench: Benchmarking Arabic AI with Large Language Models0
On quantifying and improving realism of images generated with diffusion0
Active Evaluation Acquisition for Efficient LLM Benchmarking0
On Symbiosis of Attribute Prediction and Semantic Segmentation0
On the Assessment of Benchmark Suites for Algorithm Comparison0
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation0
Show:102550
← PrevPage 79 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified