SOTAVerified

Benchmarking

Papers

Showing 351360 of 5548 papers

TitleStatusHype
CoIR: A Comprehensive Benchmark for Code Information Retrieval ModelsCode2
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code GenerationCode2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
Assessing SPARQL capabilities of Large Language ModelsCode2
Class-incremental Learning for Time Series: Benchmark and EvaluationCode2
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
ClimateLearn: Benchmarking Machine Learning for Weather and Climate ModelingCode2
CoqPilot, a plugin for LLM-based generation of proofsCode2
BTS: Building Timeseries Dataset: Empowering Large-Scale Building AnalyticsCode2
Show:102550
← PrevPage 36 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified