SOTAVerified

Benchmarking

Papers

Showing 291300 of 5548 papers

TitleStatusHype
Assessing SPARQL capabilities of Large Language ModelsCode2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
State-specific protein-ligand complex structure prediction with a multi-scale deep generative modelCode2
Deep Visual Geo-localization BenchmarkCode2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)Code2
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code GenerationCode2
EasyTPP: Towards Open Benchmarking Temporal Point ProcessesCode2
Evaluating Large-Vocabulary Object Detectors: The Devil is in the DetailsCode2
A Survey on Multimodal Benchmarks: In the Era of Large AI ModelsCode2
Fortuna: A Library for Uncertainty Quantification in Deep LearningCode2
Show:102550
← PrevPage 30 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified