Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4501–4525 of 5548 papers

Title	Date	Tasks	Status
Beyond MD17: the reactive xxMD dataset	Aug 22, 2023	BenchmarkingComputational chemistry	CodeCode Available
The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R	Jan 20, 2017	Benchmarking	CodeCode Available
Learning to Transfer for Traffic Forecasting via Multi-task Learning	Nov 27, 2021	BenchmarkingDomain Adaptation	CodeCode Available
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance	Sep 22, 2024	AutoMLBenchmarking	CodeCode Available
BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation	Nov 14, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available
RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification	Feb 5, 2022	BenchmarkingBinary Classification	CodeCode Available
Inverse Contextual Bandits: Learning How Behavior Evolves over Time	Jul 13, 2021	BenchmarkingDecision Making	CodeCode Available
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models	Oct 17, 2024	Benchmarking	CodeCode Available
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM	Oct 8, 2014	Benchmarking	CodeCode Available
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition	Jun 10, 2024	BenchmarkingEmotion Recognition	CodeCode Available
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models	Mar 11, 2025	BenchmarkingHyperparameter Optimization	CodeCode Available
BdSLW60: A Word-Level Bangla Sign Language Dataset	Feb 13, 2024	BenchmarkingGesture Recognition	CodeCode Available
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse	Feb 15, 2024	BenchmarkingModel Editing	CodeCode Available
Integrating Expert Knowledge into Logical Programs via LLMs	Feb 17, 2025	BenchmarkingLogical Reasoning	CodeCode Available
The CaLiGraph Ontology as a Challenge for OWL Reasoners	Oct 11, 2021	BenchmarkingKnowledge Graphs	CodeCode Available
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning	Jun 9, 2025	Active LearningBenchmarking	CodeCode Available
Strong and Simple Baselines for Multimodal Utterance Embeddings	May 14, 2019	Benchmarking	CodeCode Available
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition	Dec 23, 2021	BenchmarkingDeep Learning	CodeCode Available
The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps	Jun 12, 2020	Benchmarkingobject-detection	CodeCode Available
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification	Mar 3, 2024	BenchmarkingSpeaker Verification	CodeCode Available
Resource Interoperability for Sustainable Benchmarking: The Case of Events	May 1, 2018	Benchmarking	CodeCode Available
Bayesian Neural Networks with Soft Evidence	Oct 19, 2020	Benchmarking	CodeCode Available
BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring	May 27, 2023	BenchmarkingDeblurring	CodeCode Available

Show:10 25 50

← PrevPage 181 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified