SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4811–4820 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges	May 16, 2025	BenchmarkingState Estimation	CodeCode Available	0
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available	0
Benchmarking Single Image Dehazing and Beyond	Dec 12, 2017	BenchmarkingImage Dehazing	CodeCode Available	0
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in Omniverse	Jun 23, 2022	BenchmarkingIndoor Scene Synthesis	CodeCode Available	0
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support	Jun 15, 2023	BenchmarkingInformation Retrieval	CodeCode Available	0
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach	Oct 9, 2017	BenchmarkingClustering	CodeCode Available	0
fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space Models	Aug 8, 2022	BenchmarkingState Space Models	CodeCode Available	0
Scaling and Benchmarking Self-Supervised Visual Representation Learning	May 3, 2019	Benchmarkingobject-detection	CodeCode Available	0
Scaling Compute Is Not All You Need for Adversarial Robustness	Dec 20, 2023	Adversarial RobustnessAll	CodeCode Available	0
Scaling Up Resonate-and-Fire Networks for Fast Deep Learning	Apr 1, 2025	BenchmarkingDeep Learning	CodeCode Available	0

Show:10 25 50

← PrevPage 482 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified