SOTAVerified

Benchmarking

Papers

Showing 15011525 of 5548 papers

TitleStatusHype
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise LevelsCode0
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and ChallengesCode0
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener FilterCode0
Benchmarking Generative Latent Variable Models for SpeechCode0
Benchmarking Generative AI Models for Deep Learning Test Input GenerationCode0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG RoutingCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Benchmarking Framework for Performance-Evaluation of Causal Inference AnalysisCode0
Benchmarking framework for machine learning classification from fNIRS dataCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and ValidationCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
A Position Paper on the Automatic Generation of Machine Learning LeaderboardsCode0
ADVIO: An authentic dataset for visual-inertial odometryCode0
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey beesCode0
Benchmarking Flexible Electric Loads Scheduling Algorithms under Market Price UncertaintyCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NASCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Show:102550
← PrevPage 61 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified