SOTAVerified

Benchmarking

Papers

Showing 19261950 of 5548 papers

TitleStatusHype
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
Cryo-RALib -- a modular library for accelerating alignment in cryo-EMCode0
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and EvaluationCode0
Beyond Slow Signs in High-fidelity Model ExtractionCode0
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
ANTHROPOS-V: benchmarking the novel task of Crowd Volume EstimationCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical SegmentationCode0
Beyond Optimism: Exploration With Partially Observable RewardsCode0
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptidesCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Bayesian Neural Networks with Soft EvidenceCode0
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World KnowledgeCode0
CURATe: Benchmarking Personalised Alignment of Conversational AI AssistantsCode0
A Modular Workflow for Performance Benchmarking of Neuronal Network SimulationsCode0
Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?Code0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep LearningCode0
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)Code0
BASED: Benchmarking, Analysis, and Structural Estimation of DeblurringCode0
Show:102550
← PrevPage 78 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified