SOTAVerified

Benchmarking

Papers

Showing 921930 of 5548 papers

TitleStatusHype
Benchmarking MedMNIST dataset on real quantum hardware0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
Positional Encoding in Transformer-Based Time Series Models: A SurveyCode1
ILIAS: Instance-Level Image retrieval At ScaleCode1
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
Knowledge-aware contrastive heterogeneous molecular graph learning0
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics0
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance0
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment0
Show:102550
← PrevPage 93 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified