SOTAVerified

Benchmarking

Papers

Showing 13811390 of 5548 papers

TitleStatusHype
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Insights from Benchmarking Frontier Language Models on Web App Code GenerationCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object DetectionCode1
In Search of Lost Online Test-time Adaptation: A SurveyCode1
PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal ModelCode1
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language ModelsCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning AlgorithmsCode1
Show:102550
← PrevPage 139 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified