SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4361–4370 of 5548 papers

Title	Date	Tasks	Status	Hype
Wildfire Forecasting with Satellite Images and Deep Generative Model	Aug 19, 2022	BenchmarkingVideo Prediction	—Unverified	0
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences	Jun 16, 2024	BenchmarkingSpatial Reasoning	—Unverified	0
Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts	Jul 4, 2017	Benchmarking	—Unverified	0
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data	Sep 17, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified	0
Word Complexity Estimation for Japanese Lexical Simplification	May 1, 2020	BenchmarkingLexical Simplification	—Unverified	0
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models	May 14, 2025	Benchmarking	—Unverified	0
Writing as a testbed for open ended agents	Mar 25, 2025	BenchmarkingDiversity	—Unverified	0
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods	Feb 5, 2025	Benchmarking	—Unverified	0
XCSP3: An Integrated Format for Benchmarking Combinatorial Constrained Problems	Nov 10, 2016	Benchmarking	—Unverified	0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis	Jun 26, 2024	Autonomous DrivingBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 437 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified