SOTAVerified

Benchmarking

Papers

Showing 14011425 of 5548 papers

TitleStatusHype
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational ScenariosCode1
IOHexperimenter: Benchmarking Platform for Iterative Optimization HeuristicsCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
IOHanalyzer: Detailed Performance Analyses for Iterative Optimization HeuristicsCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K datasetCode1
PyRelationAL: a python library for active learning research and developmentCode1
PyRobot: An Open-source Robotics Framework for Research and BenchmarkingCode1
Automatic sleep stage classification with deep residual networks in a mixed-cohort settingCode1
EgoNormia: Benchmarking Physical Social Norm UnderstandingCode1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization HeuristicsCode1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
Exploiting News Article Structure for Automatic Corpus Generation of Entailment DatasetsCode1
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language ModelsCode1
Recent Advances on Neural Network Pruning at InitializationCode1
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking DatasetCode1
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergenceCode1
Autonomous Microscopy Experiments through Large Language Model AgentsCode1
EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearnerCode1
Autonomous Reinforcement Learning: Formalism and BenchmarkingCode1
Introducing Milabench: Benchmarking Accelerators for AICode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
Show:102550
← PrevPage 57 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified