Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1901–1950 of 5548 papers

Title	Date	Tasks	Status	Score
Individual Fairness Guarantees for Neural Networks	May 11, 2022	BenchmarkingFairness	CodeCode Available	5
CREPO: An Open Repository to Benchmark Credal Network Algorithms	May 10, 2021	Benchmarking	CodeCode Available	5
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available	5
Beemo: Benchmark of Expert-edited Machine-generated Outputs	Nov 6, 2024	Benchmarking	CodeCode Available	5
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset Generation	May 27, 2022	BenchmarkingDataset Generation	CodeCode Available	5
AdamZ: An Enhanced Optimisation Method for Neural Network Training	Nov 22, 2024	Benchmarking	CodeCode Available	5
Critical review of conformational B-cell epitope prediction methods	Jan 10, 2023	BenchmarkingDrug Design	CodeCode Available	5
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available	5
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available	5
Bias Analysis and Mitigation in the Evaluation of Authorship Verification	Jul 1, 2019	Authorship VerificationBenchmarking	CodeCode Available	5
BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection	Jun 15, 2023	BenchmarkingOut-of-Distribution Detection	CodeCode Available	5
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Mar 7, 2024	Benchmarking	CodeCode Available	5
Benchmarking Domain Generalization Algorithms in Computational Pathology	Sep 25, 2024	BenchmarkingData Augmentation	CodeCode Available	5
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available	5
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs	Oct 17, 2024	Benchmarking	CodeCode Available	5
Cross-lingual sentiment classification in low-resource Bengali language	Nov 1, 2020	BenchmarkingClassification	CodeCode Available	5
Cross-Lingual Text Classification of Transliterated Hindi and Malayalam	Aug 31, 2021	BenchmarkingClassification	CodeCode Available	5
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets	May 23, 2022	Argument MiningBenchmarking	CodeCode Available	5
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads	Nov 6, 2022	BenchmarkingOpinion Mining	CodeCode Available	5
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation	Nov 14, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available	5
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available	5
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare	May 26, 2025	BenchmarkingMedical Diagnosis	CodeCode Available	5
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning	Jun 16, 2022	BenchmarkingClustering	CodeCode Available	5
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction	Oct 20, 2021	BenchmarkingLanguage Modeling	CodeCode Available	5
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II	Sep 17, 2024	BenchmarkingDescriptive	CodeCode Available	5
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning	Apr 4, 2021	BenchmarkingMulti Label Text Classification	CodeCode Available	5
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available	5
Cryo-RALib -- a modular library for accelerating alignment in cryo-EM	Nov 11, 2020	BenchmarkingGPU	CodeCode Available	5
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation	Dec 4, 2020	BenchmarkingMachine Translation	CodeCode Available	5
Beyond Slow Signs in High-fidelity Model Extraction	Jun 14, 2024	Benchmarkingmodel	CodeCode Available	5
BdSLW60: A Word-Level Bangla Sign Language Dataset	Feb 13, 2024	BenchmarkingGesture Recognition	CodeCode Available	5
ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation	Jan 3, 2025	BenchmarkingCrowd Counting	CodeCode Available	5
Immunofluorescence Capillary Imaging Segmentation: Cases Study	Jul 14, 2022	BenchmarkingImage Segmentation	CodeCode Available	5
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation	Feb 5, 2024	BenchmarkingImage Segmentation	CodeCode Available	5
Beyond Optimism: Exploration With Partially Observable Rewards	Jun 20, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	5
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides	Apr 15, 2024	BenchmarkingProtein Language Model	CodeCode Available	5
Impact of ImageNet Model Selection on Domain Adaptation	Feb 6, 2020	BenchmarkingDomain Adaptation	CodeCode Available	5
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available	5
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples	Feb 6, 2025	BenchmarkingDeepFake Detection	CodeCode Available	5
Bayesian Neural Networks with Soft Evidence	Oct 19, 2020	Benchmarking	CodeCode Available	5
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge	Dec 18, 2024	BenchmarkingWorld Knowledge	CodeCode Available	5
CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants	Oct 28, 2024	Benchmarking	CodeCode Available	5
A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations	Dec 16, 2021	Benchmarking	CodeCode Available	5
Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?	Nov 6, 2020	Active LearningBenchmarking	CodeCode Available	5
Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization	Aug 29, 2024	BenchmarkingDiversity	CodeCode Available	5
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions	Dec 11, 2024	BenchmarkingQuestion Answering	CodeCode Available	5
Beyond Document Page Classification: Design, Datasets, and Challenges	Aug 24, 2023	BenchmarkingClassification	CodeCode Available	5
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning	Jan 29, 2019	BenchmarkingDeep Learning	CodeCode Available	5
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)	Oct 6, 2022	Benchmarking	CodeCode Available	5
BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring	May 27, 2023	BenchmarkingDeblurring	CodeCode Available	5

Show:10 25 50

← PrevPage 39 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified