SOTAVerified

Benchmarking

Papers

Showing 15511575 of 5548 papers

TitleStatusHype
ANTHROPOS-V: benchmarking the novel task of Crowd Volume EstimationCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Adversarial Environment Generation for Learning to Navigate the WebCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum SystemsCode0
Answer Consolidation: Formulation and BenchmarkingCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
Large Scale Clustering with Variational EM for Gaussian Mixture ModelsCode0
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Benchmarking Distributional Alignment of Large Language ModelsCode0
A novel evaluation methodology for supervised Feature Ranking algorithmsCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
Benchmarking Differentially Private Residual Networks for Medical ImageryCode0
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical ImagingCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
KArSL: Arabic Sign Language DatabaseCode0
Benchmarking Deep Spiking Neural Networks on Neuromorphic HardwareCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
An Optical Control Environment for Benchmarking Reinforcement Learning AlgorithmsCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
Show:102550
← PrevPage 63 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified