SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 481–490 of 5548 papers

Title	Date	Tasks	Status	Hype
A Survey of Pathology Foundation Model: Progress and Future Directions	Apr 5, 2025	BenchmarkingMultiple Instance Learning	CodeCode Available	1
Generative Evaluation of Complex Reasoning in Large Language Models	Apr 3, 2025	BenchmarkingMemorization	CodeCode Available	1
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing	Apr 2, 2025	3D ReconstructionBenchmarking	CodeCode Available	1
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers	Mar 31, 2025	Benchmarking	CodeCode Available	1
EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos	Mar 28, 2025	BenchmarkingQuestion Answering	CodeCode Available	1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Mar 27, 2025	AttributeBenchmarking	CodeCode Available	1
A Comprehensive Benchmark for RNA 3D Structure-Function Modeling	Mar 27, 2025	BenchmarkingDeep Learning	CodeCode Available	1
The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs	Mar 25, 2025	BenchmarkingScene Segmentation	CodeCode Available	1
NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios	Mar 25, 2025	BenchmarkingOffline RL	CodeCode Available	1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models	Mar 25, 2025	BenchmarkingImage Captioning	CodeCode Available	1

Show:10 25 50

← PrevPage 49 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified