SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1591–1600 of 5548 papers

Title	Date	Tasks	Status	Hype
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens	Jun 10, 2025	BenchmarkingMathematical Reasoning	—Unverified	0
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms	Jun 10, 2025	BenchmarkingGraph Attention	—Unverified	0
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP	Jun 10, 2025	BenchmarkingSentiment Analysis	—Unverified	0
Solving excited states for long-range interacting trapped ions with neural networks	Jun 10, 2025	Benchmarking	—Unverified	0
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech	Jun 9, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework	Jun 9, 2025	BenchmarkingFairness	—Unverified	0
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors	Jun 9, 2025	BenchmarkingModel extraction	—Unverified	0
Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting	Jun 9, 2025	BenchmarkingDecision Making	—Unverified	0
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning	Jun 9, 2025	Active LearningBenchmarking	CodeCode Available	0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding	Jun 9, 2025	BenchmarkingVideo Compression	—Unverified	0

Show:10 25 50

← PrevPage 160 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified