Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4526–4550 of 5548 papers

Title	Date	Tasks	Status
Bugs in the Data: How ImageNet Misrepresents Biodiversity	Aug 24, 2022	BenchmarkingObject Detection	CodeCode Available
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English	Oct 12, 2024	Benchmarking	CodeCode Available
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion	May 28, 2023	BenchmarkingDecision Making	CodeCode Available
Individual Fairness Guarantees for Neural Networks	May 11, 2022	BenchmarkingFairness	CodeCode Available
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available
LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques	Apr 18, 2017	Benchmarking	CodeCode Available
BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow Images	Sep 7, 2018	Benchmarking	CodeCode Available
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition	May 30, 2022	Benchmarking	CodeCode Available
Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods	Jun 1, 2020	Adversarial RobustnessBenchmarking	CodeCode Available
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples	Feb 6, 2025	BenchmarkingDeepFake Detection	CodeCode Available
BSBench: will your LLM find the largest prime number?	Jun 5, 2025	Benchmarking	CodeCode Available
Light Field Saliency Detection with Deep Convolutional Networks	Jun 19, 2019	BenchmarkingSaliency Detection	CodeCode Available
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning	Apr 4, 2021	BenchmarkingMulti Label Text Classification	CodeCode Available
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation	Apr 29, 2025	BenchmarkingFairness	CodeCode Available
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science	Feb 23, 2025	BenchmarkingCode Generation	CodeCode Available
Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs	Jul 6, 2024	BenchmarkingDataset Generation	CodeCode Available
On-orbit model training for satellite imagery with label proportions	Jun 21, 2023	BenchmarkingEarth Observation	CodeCode Available
LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping	Feb 27, 2025	Benchmarking	CodeCode Available
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available
Rethinking the Reference-based Distinctive Image Captioning	Jul 22, 2022	AttributeBenchmarking	CodeCode Available
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints	Sep 12, 2024	Benchmarking	CodeCode Available
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception	Feb 7, 2024	Benchmarking	CodeCode Available
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery	Jan 2, 2025	BenchmarkingExperimental Design	CodeCode Available
BONES: a Benchmark fOr Neural Estimation of Shapley values	Jul 23, 2024	Benchmarking	CodeCode Available

Show:10 25 50

← PrevPage 182 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified