Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1851–1900 of 5548 papers

Title	Date	Tasks	Status	Score
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies	Feb 19, 2024	Benchmarking	CodeCode Available	5
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM	Oct 8, 2014	Benchmarking	CodeCode Available	5
Inverse Contextual Bandits: Learning How Behavior Evolves over Time	Jul 13, 2021	BenchmarkingDecision Making	CodeCode Available	5
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition	Jun 10, 2024	BenchmarkingEmotion Recognition	CodeCode Available	5
LMEMs for post-hoc analysis of HPO Benchmarking	Aug 5, 2024	BenchmarkingHyperparameter Optimization	CodeCode Available	5
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance	Sep 22, 2024	AutoMLBenchmarking	CodeCode Available	5
IPC: A Benchmark Data Set for Learning with Graph-Structured Data	May 15, 2019	BenchmarkingGraph Classification	CodeCode Available	5
Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness	Oct 28, 2023	Benchmarkingimage-classification	CodeCode Available	5
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models	Mar 11, 2025	BenchmarkingHyperparameter Optimization	CodeCode Available	5
BONES: a Benchmark fOr Neural Estimation of Shapley values	Jul 23, 2024	Benchmarking	CodeCode Available	5
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation	Jan 27, 2021	BenchmarkingText Generation	CodeCode Available	5
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box	Mar 4, 2022	Benchmarkingcounterfactual	CodeCode Available	5
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available	5
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts	Dec 3, 2024	Age And Gender ClassificationAge and Gender Estimation	CodeCode Available	5
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition	Dec 23, 2021	BenchmarkingDeep Learning	CodeCode Available	5
Benchmark data and method for real-time people counting in cluttered scenes using depth sensors	Apr 12, 2018	Benchmarking	CodeCode Available	5
Continuous Optimization Benchmarks by Simulation	Aug 14, 2020	BenchmarkingGaussian Processes	CodeCode Available	5
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available	5
BLESS: Benchmarking Large Language Models on Sentence Simplification	Oct 24, 2023	BenchmarkingDiversity	CodeCode Available	5
A Benchmarking Dataset with 2440 Organic Molecules for Volume Distribution at Steady State	Nov 10, 2022	Benchmarkingfeature selection	CodeCode Available	5
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available	5
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios	Mar 8, 2025	BenchmarkingDiagnostic	CodeCode Available	5
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available	5
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples	Feb 6, 2025	BenchmarkingDeepFake Detection	CodeCode Available	5
Individual Fairness Guarantees for Neural Networks	May 11, 2022	BenchmarkingFairness	CodeCode Available	5
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available	5
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis	May 14, 2025	BenchmarkingComputational Efficiency	CodeCode Available	5
BioSentVec: creating sentence embeddings for biomedical texts	Oct 22, 2018	ArticlesBenchmarking	CodeCode Available	5
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search	Dec 1, 2022	BenchmarkingGPU	CodeCode Available	5
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning	Apr 4, 2021	BenchmarkingMulti Label Text Classification	CodeCode Available	5
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available	5
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search	Aug 9, 2021	BenchmarkingGPU	CodeCode Available	5
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part I	Sep 12, 2024	BenchmarkingCPU	CodeCode Available	5
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II	Sep 17, 2024	BenchmarkingDescriptive	CodeCode Available	5
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available	5
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction	Oct 20, 2021	BenchmarkingLanguage Modeling	CodeCode Available	5
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available	5
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads	Nov 6, 2022	BenchmarkingOpinion Mining	CodeCode Available	5
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Mar 7, 2024	Benchmarking	CodeCode Available	5
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks	Aug 17, 2023	BenchmarkingEEG	CodeCode Available	5
BioFors: A Large Biomedical Image Forensics Dataset	Aug 30, 2021	BenchmarkingImage Forensics	CodeCode Available	5
Immunofluorescence Capillary Imaging Segmentation: Cases Study	Jul 14, 2022	BenchmarkingImage Segmentation	CodeCode Available	5
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning	Sep 30, 2024	BenchmarkingDisparity Estimation	CodeCode Available	5
Impact of ImageNet Model Selection on Domain Adaptation	Feb 6, 2020	BenchmarkingDomain Adaptation	CodeCode Available	5
Benchmarking Attribution Methods with Relative Feature Importance	Jul 23, 2019	BenchmarkingFeature Importance	CodeCode Available	5
Bilingual BSARD: Extending Statutory Article Retrieval to Dutch	Dec 10, 2024	ArticlesBenchmarking	CodeCode Available	5
Beemo: Benchmark of Expert-edited Machine-generated Outputs	Nov 6, 2024	Benchmarking	CodeCode Available	5
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset Generation	May 27, 2022	BenchmarkingDataset Generation	CodeCode Available	5
AdamZ: An Enhanced Optimisation Method for Neural Network Training	Nov 22, 2024	Benchmarking	CodeCode Available	5
Bias Analysis and Mitigation in the Evaluation of Authorship Verification	Jul 1, 2019	Authorship VerificationBenchmarking	CodeCode Available	5

Show:10 25 50

← PrevPage 38 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified