SOTAVerified

Benchmarking

Papers

Showing 19011950 of 5548 papers

TitleStatusHype
Individual Fairness Guarantees for Neural NetworksCode0
CREPO: An Open Repository to Benchmark Credal Network AlgorithmsCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Beemo: Benchmark of Expert-edited Machine-generated OutputsCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
AdamZ: An Enhanced Optimisation Method for Neural Network TrainingCode0
Critical review of conformational B-cell epitope prediction methodsCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequencesCode0
Bias Analysis and Mitigation in the Evaluation of Authorship VerificationCode0
BED: Bi-Encoder-Based Detectors for Out-of-Distribution DetectionCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMsCode0
Cross-lingual sentiment classification in low-resource Bengali languageCode0
Cross-Lingual Text Classification of Transliterated Hindi and MalayalamCode0
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining DatasetsCode0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsCode0
BEARD: Benchmarking the Adversarial Robustness for Dataset DistillationCode0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and HealthcareCode0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation LearningCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part IICode0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
Cryo-RALib -- a modular library for accelerating alignment in cryo-EMCode0
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and EvaluationCode0
Beyond Slow Signs in High-fidelity Model ExtractionCode0
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
ANTHROPOS-V: benchmarking the novel task of Crowd Volume EstimationCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical SegmentationCode0
Beyond Optimism: Exploration With Partially Observable RewardsCode0
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptidesCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Bayesian Neural Networks with Soft EvidenceCode0
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World KnowledgeCode0
CURATe: Benchmarking Personalised Alignment of Conversational AI AssistantsCode0
A Modular Workflow for Performance Benchmarking of Neuronal Network SimulationsCode0
Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?Code0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep LearningCode0
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)Code0
BASED: Benchmarking, Analysis, and Structural Estimation of DeblurringCode0
Show:102550
← PrevPage 39 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified