SOTAVerified

Benchmarking

Papers

Showing 43614370 of 5548 papers

TitleStatusHype
Wildfire Forecasting with Satellite Images and Deep Generative Model0
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences0
Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts0
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data0
Word Complexity Estimation for Japanese Lexical Simplification0
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models0
Writing as a testbed for open ended agents0
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods0
XCSP3: An Integrated Format for Benchmarking Combinatorial Constrained Problems0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis0
Show:102550
← PrevPage 437 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified