SOTAVerified

Benchmarking

Papers

Showing 17761800 of 5548 papers

TitleStatusHype
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools0
Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis0
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices0
LAraBench: Benchmarking Arabic AI with Large Language Models0
Cognitive Model Priors for Predicting Human Decisions0
Coherent Feed Forward Quantum Neural Network0
Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks0
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors0
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition0
Diverse Community Data for Benchmarking Data Privacy Algorithms0
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models0
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction0
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration0
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance0
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
Distribution-Based Invariant Deep Networks for Learning Meta-Features0
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
ChatGPT Alternative Solutions: Large Language Models Survey0
Commute Graph Neural Networks0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets0
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts0
Distributed Training Large-Scale Deep Architectures0
Sensitivity analysis and experimental evaluation of PID-like continuous sliding mode control0
Show:102550
← PrevPage 72 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified