SOTAVerified

Benchmarking

Papers

Showing 14011410 of 5548 papers

TitleStatusHype
MS MARCO: A Human Generated MAchine Reading COmprehension DatasetCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
Can 3D Vision-Language Models Truly Understand Natural Language?Code1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of CancerCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Multilingual Conceptual Coverage in Text-to-Image ModelsCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
Show:102550
← PrevPage 141 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified