SOTAVerified

Benchmarking

Papers

Showing 24412450 of 5548 papers

TitleStatusHype
Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific AbstractsCode0
Strong and Simple Baselines for Multimodal Utterance EmbeddingsCode0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician ExamsCode0
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language ModelsCode0
Benchmarking Large Language Models for Math Reasoning TasksCode0
Benchmarking Large Language Models for Image Classification of Marine MammalsCode0
Divergent Creativity in Humans and Large Language ModelsCode0
Generalization and Regularization in DQNCode0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
Show:102550
← PrevPage 245 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified