Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1401–1425 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios	May 22, 2025	Benchmarking	CodeCode Available	1	5
IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics	Nov 7, 2021	Bayesian OptimizationBenchmarking	CodeCode Available	1	5
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks	May 13, 2021	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics	Jul 8, 2020	Bayesian OptimizationBenchmarking	CodeCode Available	1	5
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1	5
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1	5
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset	Aug 12, 2024	Benchmarking	CodeCode Available	1	5
PyRelationAL: a python library for active learning research and development	May 23, 2022	Active LearningBenchmarking	CodeCode Available	1	5
PyRobot: An Open-source Robotics Framework for Research and Benchmarking	Jun 19, 2019	BenchmarkingRobotic Grasping	CodeCode Available	1	5
Automatic sleep stage classification with deep residual networks in a mixed-cohort setting	Aug 21, 2020	Automatic Sleep Stage ClassificationBenchmarking	CodeCode Available	1	5
EgoNormia: Benchmarking Physical Social Norm Understanding	Feb 27, 2025	Answer GenerationBenchmarking	CodeCode Available	1	5
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning	Dec 11, 2023	BenchmarkingHuman-Object Interaction Detection	CodeCode Available	1	5
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics	Oct 11, 2018	Benchmarking	CodeCode Available	1	5
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets	Oct 22, 2020	ArticlesBenchmarking	CodeCode Available	1	5
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models	Jun 9, 2024	Benchmarking	CodeCode Available	1	5
Recent Advances on Neural Network Pruning at Initialization	Mar 11, 2021	BenchmarkingNetwork Pruning	CodeCode Available	1	5
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset	Jul 3, 2024	BenchmarkingDiversity	CodeCode Available	1	5
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence	Nov 1, 2023	BenchmarkingCryogenic Electron Microscopy (cryo-EM)	CodeCode Available	1	5
Autonomous Microscopy Experiments through Large Language Model Agents	Dec 18, 2024	BenchmarkingExperimental Design	CodeCode Available	1	5
EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner	Jun 30, 2020	BenchmarkingDepth Estimation	CodeCode Available	1	5
Autonomous Reinforcement Learning: Formalism and Benchmarking	Dec 17, 2021	Benchmarkingreinforcement-learning	CodeCode Available	1	5
Introducing Milabench: Benchmarking Accelerators for AI	Nov 18, 2024	BenchmarkingDeep Learning	CodeCode Available	1	5
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data	Jun 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 57 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified