SOTAVerified

Benchmarking

Papers

Showing 361370 of 5548 papers

TitleStatusHype
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
COALA: A Practical and Vision-Centric Federated Learning PlatformCode2
Class-incremental Learning for Time Series: Benchmark and EvaluationCode2
ClimateLearn: Benchmarking Machine Learning for Weather and Climate ModelingCode2
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation BenchmarkCode2
CoIR: A Comprehensive Benchmark for Code Information Retrieval ModelsCode2
CausalGym: Benchmarking causal interpretability methods on linguistic tasksCode2
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
BTS: Building Timeseries Dataset: Empowering Large-Scale Building AnalyticsCode2
Show:102550
← PrevPage 37 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified