SOTAVerified

Benchmarking

Papers

Showing 29412950 of 5548 papers

TitleStatusHype
SMPLer-X: Scaling Up Expressive Human Pose and Shape EstimationCode3
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural NetworksCode1
FORB: A Flat Object Retrieval Benchmark for Universal Image EmbeddingCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
Revisiting Neural Program Smoothing for FuzzingCode1
Language Models as a Service: Overview of a New Paradigm and its Challenges0
LawBench: Benchmarking Legal Knowledge of Large Language ModelsCode2
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and BeyondCode2
The Trickle-down Impact of Reward (In-)consistency on RLHFCode1
OceanBench: The Sea Surface Height EditionCode1
Show:102550
← PrevPage 295 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified