Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3151–3175 of 5548 papers

Title	Date	Tasks	Status
Is margin all you need? An extensive empirical study of active learning on tabular data	Oct 7, 2022	Active LearningAll	—Unverified
Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass	Jan 29, 2021	Benchmarking	—Unverified
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations	Apr 1, 2024	BenchmarkingMath	—Unverified
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified
Benchmarking real-time algorithms for in-phase auditory stimulation of low amplitude slow waves with wearable EEG devices during sleep	Mar 4, 2022	BenchmarkingComputational Efficiency	—Unverified
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?	Jul 17, 2024	BenchmarkingSarcasm Detection	—Unverified
Is Self-Supervision Enough? Benchmarking Foundation Models Against End-to-End Training for Mitotic Figure Classification	Dec 9, 2024	Benchmarking	—Unverified
Is Single-View Mesh Reconstruction Ready for Robotics?	May 23, 2025	3D ReconstructionBenchmarking	—Unverified
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images	May 30, 2024	AllBenchmarking	—Unverified
Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?	Sep 12, 2022	BenchmarkingGeneralizable Person Re-identification	—Unverified
Is Transfer Learning Necessary for Protein Landscape Prediction?	Oct 31, 2020	BenchmarkingPrediction	—Unverified
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?	Mar 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models	Mar 9, 2025	Benchmarking	—Unverified
The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence	Oct 14, 2024	Benchmarking	—Unverified
A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation	Oct 20, 2020	Age EstimationBenchmarking	—Unverified
Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review	Feb 26, 2025	BenchmarkingText Detection	—Unverified
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes	Jan 21, 2025	Benchmarking	—Unverified
Iterated Invariant Extended Kalman Filter (IterIEKF)	Apr 16, 2024	Benchmarking	—Unverified
Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines	Nov 14, 2016	Benchmarking	—Unverified
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives	Jun 12, 2024	AllBenchmarking	—Unverified
"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning	Jan 7, 2023	BenchmarkingMulti-Task Learning	—Unverified
iWarded: A System for Benchmarking Datalog+/- Reasoning (technical report)	Mar 15, 2021	BenchmarkingKnowledge Graphs	—Unverified
IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays	Apr 20, 2025	3D ReconstructionAnatomy	—Unverified
Jailbreak Distillation: Renewable Safety Benchmarking	May 28, 2025	BenchmarkingDiversity	—Unverified
The Unconstrained Ear Recognition Challenge	Aug 23, 2017	BenchmarkingPerson Recognition	—Unverified

Show:10 25 50

← PrevPage 127 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified