SOTAVerified

Benchmarking

Papers

Showing 191200 of 5548 papers

TitleStatusHype
Video Quality Assessment: A Comprehensive SurveyCode2
Commit0: Library Generation from ScratchCode2
OpenQDC: Open Quantum Data CommonsCode2
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial TasksCode2
HourVideo: 1-Hour Video-Language UnderstandingCode2
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive PrototypingCode2
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI AcceleratorsCode2
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail ModelsCode2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation GenerationCode2
PC-Gym: Benchmark Environments For Process Control ProblemsCode2
Show:102550
← PrevPage 20 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified