SOTAVerified

Benchmarking

Papers

Showing 14111420 of 5548 papers

TitleStatusHype
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning AlgorithmsCode1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
Introducing Milabench: Benchmarking Accelerators for AICode1
EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture SearchCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
BEND: Benchmarking DNA Language Models on biologically meaningful tasksCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
Show:102550
← PrevPage 142 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified