SOTAVerified

Benchmarking

Papers

Showing 351360 of 5548 papers

TitleStatusHype
Commit0: Library Generation from ScratchCode2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
CoIR: A Comprehensive Benchmark for Code Information Retrieval ModelsCode2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
CoqPilot, a plugin for LLM-based generation of proofsCode2
Class-incremental Learning for Time Series: Benchmark and EvaluationCode2
ClimateLearn: Benchmarking Machine Learning for Weather and Climate ModelingCode2
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
COALA: A Practical and Vision-Centric Federated Learning PlatformCode2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation GenerationCode2
Show:102550
← PrevPage 36 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified