SOTAVerified

Benchmarking

Papers

Showing 35413550 of 5548 papers

TitleStatusHype
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering0
Knowledge-guided Contextual Gene Set Analysis Using Large Language Models0
MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models0
MediaEval 2018: Predicting Media Memorability Task0
Benchmarking Large Language Models for Handwritten Text Recognition0
MedMeshCNN -- Enabling MeshCNN for Medical Surface Models0
Benchmarking large language models for materials synthesis: the case of atomic layer deposition0
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents0
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding0
Show:102550
← PrevPage 355 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified