SOTAVerified

Benchmarking

Papers

Showing 14011425 of 5548 papers

TitleStatusHype
Benchmarking Image Retrieval for Visual LocalizationCode1
MT-LENS: An all-in-one Toolkit for Better Machine Translation EvaluationCode1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRTCode1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them allCode1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite ImageryCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
Curious Hierarchical Actor-Critic Reinforcement LearningCode1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language ModelsCode1
Multimodal Fusion via Teacher-Student Network for Indoor Action RecognitionCode1
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksCode1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic EnvironmentsCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph DataCode1
NAS-Bench-101: Towards Reproducible Neural Architecture SearchCode1
NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture SearchCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and SizeCode1
Autonomous Microscopy Experiments through Large Language Model AgentsCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Autonomous Reinforcement Learning: Formalism and BenchmarkingCode1
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of CancerCode1
D2S: Document-to-Slide Generation Via Query-Based Text SummarizationCode1
Show:102550
← PrevPage 57 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified