SOTAVerified

Benchmarking

Papers

Showing 8190 of 5548 papers

TitleStatusHype
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity AnalysisCode3
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image AnalysisCode3
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective TasksCode3
Advancing LLM Reasoning Generalists with Preference TreesCode3
Benchmarking Automatic Machine Learning FrameworksCode3
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI SystemsCode3
DeepFake-O-Meter v2.0: An Open Platform for DeepFake DetectionCode3
CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous DrivingCode3
AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic BenchmarkingCode3
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement LearningCode3
Show:102550
← PrevPage 9 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified