SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3651–3660 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified	0	0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech	Jun 9, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0	0
Benchmarking Foundation Models with Language-Model-as-an-Examiner	Jun 7, 2023	BenchmarkingLanguage Modeling	—Unverified	0	0
Benchmarking Foundation Models for Zero-Shot Biometric Tasks	May 30, 2025	AttributeBenchmarking	—Unverified	0	0
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents	Jun 12, 2024	BenchmarkingLanguage Modeling	—Unverified	0	0
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases	Jun 12, 2024	BenchmarkingModel Compression	—Unverified	0	0
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology	Aug 28, 2024	BenchmarkingDiversity	—Unverified	0	0
Model Agnostic Explainable Selective Regression via Uncertainty Estimation	Nov 15, 2023	Benchmarkingmodel	—Unverified	0	0
Model-based trajectory stitching for improved behavioural cloning and its applications	Dec 8, 2022	Behavioural cloningBenchmarking	—Unverified	0	0
Model-Based Underwater 6D Pose Estimation from RGB	Feb 14, 2023	2D Object Detection6D Pose Estimation	—Unverified	0	0

Show:10 25 50

← PrevPage 366 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified