SOTAVerified

Benchmarking

Papers

Showing 361370 of 5548 papers

TitleStatusHype
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model PerformanceCode2
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
Class-incremental Learning for Time Series: Benchmark and EvaluationCode2
Commit0: Library Generation from ScratchCode2
Building Normalizing Flows with Stochastic InterpolantsCode2
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in ChineseCode2
BTS: Building Timeseries Dataset: Empowering Large-Scale Building AnalyticsCode2
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement LearningCode2
Benchmarking Robustness of 3D Point Cloud Recognition Against Common CorruptionsCode2
Benchmarking Complex Instruction-Following with Multiple Constraints CompositionCode2
Show:102550
← PrevPage 37 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified