SOTAVerified

Benchmarking

Papers

Showing 671680 of 5548 papers

TitleStatusHype
Bencher: Simple and Reproducible Benchmarking for Black-Box OptimizationCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
Attention, Please! Revisiting Attentive Probing for Masked Image ModelingCode1
A Unified Taxonomy and Multimodal Dataset for Events in Invasion GamesCode1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
Beacon, a lightweight deep reinforcement learning benchmark library for flow controlCode1
Show:102550
← PrevPage 68 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified