SOTAVerified

Benchmarking

Papers

Showing 481490 of 5548 papers

TitleStatusHype
A Survey of Pathology Foundation Model: Progress and Future DirectionsCode1
Generative Evaluation of Complex Reasoning in Large Language ModelsCode1
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research PapersCode1
EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric VideosCode1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMsCode1
A Comprehensive Benchmark for RNA 3D Structure-Function ModelingCode1
The Coralscapes Dataset: Semantic Scene Understanding in Coral ReefsCode1
NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic ScenariosCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
Show:102550
← PrevPage 49 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified