SOTAVerified

Benchmarking

Papers

Showing 971980 of 5548 papers

TitleStatusHype
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
MLFMF: Data Sets for Machine Learning for Mathematical FormalizationCode1
Coarse-to-Fine Q-attention with Learned Path RankingCode1
3DYoga90: A Hierarchical Video Dataset for Yoga Pose UnderstandingCode1
Restore Anything Model via Efficient Degradation AdaptationCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
MMTU: A Massive Multi-Task Table Understanding and Reasoning BenchmarkCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Show:102550
← PrevPage 98 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified