Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4501–4550 of 5548 papers

Title	Date	Tasks	Status
Beyond MD17: the reactive xxMD dataset	Aug 22, 2023	BenchmarkingComputational chemistry	CodeCode Available
The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R	Jan 20, 2017	Benchmarking	CodeCode Available
Learning to Transfer for Traffic Forecasting via Multi-task Learning	Nov 27, 2021	BenchmarkingDomain Adaptation	CodeCode Available
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance	Sep 22, 2024	AutoMLBenchmarking	CodeCode Available
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation	Nov 14, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available
RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification	Feb 5, 2022	BenchmarkingBinary Classification	CodeCode Available
Inverse Contextual Bandits: Learning How Behavior Evolves over Time	Jul 13, 2021	BenchmarkingDecision Making	CodeCode Available
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models	Oct 17, 2024	Benchmarking	CodeCode Available
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM	Oct 8, 2014	Benchmarking	CodeCode Available
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition	Jun 10, 2024	BenchmarkingEmotion Recognition	CodeCode Available
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models	Mar 11, 2025	BenchmarkingHyperparameter Optimization	CodeCode Available
BdSLW60: A Word-Level Bangla Sign Language Dataset	Feb 13, 2024	BenchmarkingGesture Recognition	CodeCode Available
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse	Feb 15, 2024	BenchmarkingModel Editing	CodeCode Available
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available
The CaLiGraph Ontology as a Challenge for OWL Reasoners	Oct 11, 2021	BenchmarkingKnowledge Graphs	CodeCode Available
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning	Jun 9, 2025	Active LearningBenchmarking	CodeCode Available
Strong and Simple Baselines for Multimodal Utterance Embeddings	May 14, 2019	Benchmarking	CodeCode Available
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition	Dec 23, 2021	BenchmarkingDeep Learning	CodeCode Available
The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps	Jun 12, 2020	Benchmarkingobject-detection	CodeCode Available
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification	Mar 3, 2024	BenchmarkingSpeaker Verification	CodeCode Available
Resource Interoperability for Sustainable Benchmarking: The Case of Events	May 1, 2018	Benchmarking	CodeCode Available
Bayesian Neural Networks with Soft Evidence	Oct 19, 2020	Benchmarking	CodeCode Available
BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring	May 27, 2023	BenchmarkingDeblurring	CodeCode Available
Bugs in the Data: How ImageNet Misrepresents Biodiversity	Aug 24, 2022	BenchmarkingObject Detection	CodeCode Available
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English	Oct 12, 2024	Benchmarking	CodeCode Available
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available
Individual Fairness Guarantees for Neural Networks	May 11, 2022	BenchmarkingFairness	CodeCode Available
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available
LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques	Apr 18, 2017	Benchmarking	CodeCode Available
BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow Images	Sep 7, 2018	Benchmarking	CodeCode Available
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition	May 30, 2022	Benchmarking	CodeCode Available
Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods	Jun 1, 2020	Adversarial RobustnessBenchmarking	CodeCode Available
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples	Feb 6, 2025	BenchmarkingDeepFake Detection	CodeCode Available
BSBench: will your LLM find the largest prime number?	Jun 5, 2025	Benchmarking	CodeCode Available
Light Field Saliency Detection with Deep Convolutional Networks	Jun 19, 2019	BenchmarkingSaliency Detection	CodeCode Available
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning	Apr 4, 2021	BenchmarkingMulti Label Text Classification	CodeCode Available
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation	Apr 29, 2025	BenchmarkingFairness	CodeCode Available
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science	Feb 23, 2025	BenchmarkingCode Generation	CodeCode Available
Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs	Jul 6, 2024	BenchmarkingDataset Generation	CodeCode Available
On-orbit model training for satellite imagery with label proportions	Jun 21, 2023	BenchmarkingEarth Observation	CodeCode Available
LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping	Feb 27, 2025	Benchmarking	CodeCode Available
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available
Rethinking the Reference-based Distinctive Image Captioning	Jul 22, 2022	AttributeBenchmarking	CodeCode Available
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints	Sep 12, 2024	Benchmarking	CodeCode Available
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery	Jan 2, 2025	BenchmarkingExperimental Design	CodeCode Available
BONES: a Benchmark fOr Neural Estimation of Shapley values	Jul 23, 2024	Benchmarking	CodeCode Available

Show:10 25 50

← PrevPage 91 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified