SOTAVerified

Benchmarking

Papers

Showing 13511360 of 5548 papers

TitleStatusHype
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Benchmarking Image Retrieval for Visual LocalizationCode1
LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property PredictionCode1
Benchmarking Test-Time Adaptation against Distribution Shifts in Image ClassificationCode1
A Unified Taxonomy and Multimodal Dataset for Events in Invasion GamesCode1
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?Code1
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIsCode1
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasetsCode1
Show:102550
← PrevPage 136 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified