SOTAVerified

Benchmarking

Papers

Showing 271280 of 5548 papers

TitleStatusHype
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision TasksCode2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)Code2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion TransferCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
BARS: Towards Open Benchmarking for Recommender SystemsCode2
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMsCode2
DaisyRec 2.0: Benchmarking Recommendation for Rigorous EvaluationCode2
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationCode2
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
Show:102550
← PrevPage 28 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified