SOTAVerified

Benchmarking

Papers

Showing 14011410 of 5548 papers

TitleStatusHype
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
PC-Gym: Benchmark Environments For Process Control ProblemsCode2
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models0
SS3DM: Benchmarking Street-View Surface Reconstruction with a Synthetic 3D Mesh Dataset0
AI Cyber Risk Benchmark: Automated Exploitation Capabilities0
Benchmarking LLM Guardrails in Handling Multilingual Toxicity0
Benchmarking Human and Automated Prompting in the Segment Anything ModelCode0
Exploring Capabilities of Time Series Foundation Models in Building Analytics0
Project MPG: towards a generalized performance benchmark for LLM capabilities0
LLMCBench: Benchmarking Large Language Model Compression for Efficient DeploymentCode1
Show:102550
← PrevPage 141 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified