SOTAVerified

Benchmarking

Papers

Showing 311320 of 5548 papers

TitleStatusHype
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion TransferCode2
DaisyRec 2.0: Benchmarking Recommendation for Rigorous EvaluationCode2
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation ModelsCode2
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous DrivingCode2
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and BeyondCode2
AIR-Bench: Benchmarking Large Audio-Language Models via Generative ComprehensionCode2
AutoPenBench: Benchmarking Generative Agents for Penetration TestingCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
Show:102550
← PrevPage 32 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified