SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2441–2450 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts	Aug 19, 2018	BenchmarkingClassification	CodeCode Available	0	5
Strong and Simple Baselines for Multimodal Utterance Embeddings	May 14, 2019	Benchmarking	CodeCode Available	0	5
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	0	5
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams	Jun 17, 2024	AllBenchmarking	CodeCode Available	0	5
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available	0	5
Benchmarking Large Language Models for Math Reasoning Tasks	Aug 20, 2024	BenchmarkingIn-Context Learning	CodeCode Available	0	5
Benchmarking Large Language Models for Image Classification of Marine Mammals	Oct 22, 2024	Benchmarkingimage-classification	CodeCode Available	0	5
Divergent Creativity in Humans and Large Language Models	May 13, 2024	Benchmarking	CodeCode Available	0	5
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available	0	5
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 245 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified