SOTAVerified

Benchmarking

Papers

Showing 19762000 of 5548 papers

TitleStatusHype
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Better Late Than Never: Formulating and Benchmarking Recommendation EditingCode0
Better force fields start with better data -- A data set of cation dipeptide interactionsCode0
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media PostsCode0
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep SupervisionCode0
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learningCode0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative StudyCode0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
A Meta-Analysis of the Anomaly Detection ProblemCode0
Benchmarks for Graph Embedding EvaluationCode0
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
Benchmark of Deep Learning Models on Large Healthcare MIMIC DatasetsCode0
AlphaZip: Neural Network-Enhanced Lossless Text CompressionCode0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot StudyCode0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Show:102550
← PrevPage 80 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified