SOTAVerified

Benchmarking

Papers

Showing 301310 of 5548 papers

TitleStatusHype
A Survey on Multimodal Benchmarks: In the Era of Large AI ModelsCode2
DaisyRec 2.0: Benchmarking Recommendation for Rigorous EvaluationCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
Authorship Obfuscation in Multilingual Machine-Generated Text DetectionCode2
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image AnalysisCode2
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement LearningCode2
A large annotated medical image dataset for the development and evaluation of segmentation algorithmsCode2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and InteractionsCode2
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language ModelsCode2
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
Show:102550
← PrevPage 31 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified