SOTAVerified

Benchmarking

Papers

Showing 221230 of 5548 papers

TitleStatusHype
InterCode: Standardizing and Benchmarking Interactive Coding with Execution FeedbackCode2
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement LearningCode2
Assessing SPARQL capabilities of Large Language ModelsCode2
Benchmarking Benchmark Leakage in Large Language ModelsCode2
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual EditingCode2
LaMAR: Benchmarking Localization and Mapping for Augmented RealityCode2
Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep LearningCode2
State-specific protein-ligand complex structure prediction with a multi-scale deep generative modelCode2
Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter ControlCode2
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
Show:102550
← PrevPage 23 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified