Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3151–3200 of 5548 papers

Title	Date	Tasks	Status
Is margin all you need? An extensive empirical study of active learning on tabular data	Oct 7, 2022	Active LearningAll	—Unverified
Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass	Jan 29, 2021	Benchmarking	—Unverified
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations	Apr 1, 2024	BenchmarkingMath	—Unverified
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval	Feb 26, 2025	BenchmarkingCode Generation	—Unverified
Benchmarking real-time algorithms for in-phase auditory stimulation of low amplitude slow waves with wearable EEG devices during sleep	Mar 4, 2022	BenchmarkingComputational Efficiency	—Unverified
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?	Jul 17, 2024	BenchmarkingSarcasm Detection	—Unverified
Is Self-Supervision Enough? Benchmarking Foundation Models Against End-to-End Training for Mitotic Figure Classification	Dec 9, 2024	Benchmarking	—Unverified
Is Single-View Mesh Reconstruction Ready for Robotics?	May 23, 2025	3D ReconstructionBenchmarking	—Unverified
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images	May 30, 2024	AllBenchmarking	—Unverified
Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?	Sep 12, 2022	BenchmarkingGeneralizable Person Re-identification	—Unverified
Is Transfer Learning Necessary for Protein Landscape Prediction?	Oct 31, 2020	BenchmarkingPrediction	—Unverified
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?	Mar 30, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models	Mar 9, 2025	Benchmarking	—Unverified
The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence	Oct 14, 2024	Benchmarking	—Unverified
A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation	Oct 20, 2020	Age EstimationBenchmarking	—Unverified
Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review	Feb 26, 2025	BenchmarkingText Detection	—Unverified
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes	Jan 21, 2025	Benchmarking	—Unverified
Iterated Invariant Extended Kalman Filter (IterIEKF)	Apr 16, 2024	Benchmarking	—Unverified
Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines	Nov 14, 2016	Benchmarking	—Unverified
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives	Jun 12, 2024	AllBenchmarking	—Unverified
"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning	Jan 7, 2023	BenchmarkingMulti-Task Learning	—Unverified
iWarded: A System for Benchmarking Datalog+/- Reasoning (technical report)	Mar 15, 2021	BenchmarkingKnowledge Graphs	—Unverified
IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays	Apr 20, 2025	3D ReconstructionAnatomy	—Unverified
Jailbreak Distillation: Renewable Safety Benchmarking	May 28, 2025	BenchmarkingDiversity	—Unverified
The Unconstrained Ear Recognition Challenge	Aug 23, 2017	BenchmarkingPerson Recognition	—Unverified
JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads	Dec 9, 2020	Anomaly DetectionBenchmarking	—Unverified
Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing	Nov 1, 2017	Automatic Post-EditingBenchmarking	—Unverified
The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix	Mar 11, 2019	BenchmarkingPerson Recognition	—Unverified
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection	Jan 28, 2025	Benchmarking	—Unverified
JENGA: Object selection and pose estimation for robotic grasping from a stack	Jun 16, 2025	BenchmarkingObject	—Unverified
AFAT: Adaptive Failure-Aware Tracker for Robust Visual Object Tracking	May 27, 2020	BenchmarkingObject Tracking	—Unverified
Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning	Sep 16, 2020	BenchmarkingLink Prediction	—Unverified
JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models	Jun 17, 2024	Benchmarkingcounterfactual	—Unverified
Job recommendations: benchmarking of collaborative filtering methods for classifieds	Jan 19, 2023	BenchmarkingCollaborative Filtering	—Unverified
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam	Sep 21, 2023	BenchmarkingComputational Efficiency	—Unverified
Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning	Nov 4, 2022	BenchmarkingDiversity	—Unverified
Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms	Nov 17, 2021	Benchmarking	—Unverified
Joint Batching and Scheduling for High-Throughput Multiuser Edge AI with Asynchronous Task Arrivals	Jul 15, 2023	BenchmarkingScheduling	—Unverified
JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation	May 15, 2025	BenchmarkingDepth Estimation	—Unverified
Joint Learning of Brain Lesion and Anatomy Segmentation from Heterogeneous Datasets	Mar 8, 2019	AnatomyBenchmarking	—Unverified
Joint Linear Precoding and DFT Beamforming Design for Massive MIMO Satellite Communication	Nov 16, 2022	Benchmarking	—Unverified
Jointly learning heterogeneous features for rgb-d activity recognition	Dec 15, 2016	Activity RecognitionBenchmarking	—Unverified
Jointly Learning Knowledge Embedding and Neighborhood Consensus with Relational Knowledge Distillation for Entity Alignment	Jan 25, 2022	BenchmarkingEntity Alignment	—Unverified
Joint Multi-Domain Learning for Automatic Short Answer Grading	Feb 25, 2019	automatic short answer gradingBenchmarking	—Unverified
Joint multi-person detection and tracking from overlapping cameras	Jun 23, 2013	BenchmarkingHuman Detection	—Unverified
AERO: Softmax-Only LLMs for Efficient Private Inference	Oct 16, 2024	BenchmarkingDecoder	—Unverified
Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks	Sep 6, 2016	BenchmarkingIntent Detection	—Unverified
Joint Phase Shift Optimization and Precoder Selection for RIS-Assisted 5G NR MIMO Systems	May 29, 2025	Benchmarking	—Unverified
Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking	Apr 20, 2020	3D Object TrackingBenchmarking	—Unverified
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models	Feb 9, 2025	BenchmarkingCode Generation	—Unverified

Show:10 25 50

← PrevPage 64 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified