SOTAVerified

Benchmarking

Papers

Showing 321330 of 5548 papers

TitleStatusHype
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
AiTLAS: Artificial Intelligence Toolbox for Earth ObservationCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
Immersive Neural Graphics PrimitivesCode2
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model AgentsCode2
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail ModelsCode2
Deep Visual Geo-localization BenchmarkCode2
A large annotated medical image dataset for the development and evaluation of segmentation algorithmsCode2
InterCode: Standardizing and Benchmarking Interactive Coding with Execution FeedbackCode2
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
Show:102550
← PrevPage 33 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified