SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 5548 papers

Title	Date	Tasks	Status	Hype
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments	Jun 11, 2025	Benchmarking	CodeCode Available	2
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models	Jun 11, 2025	BenchmarkingFederated Learning	—Unverified	0
Attention, Please! Revisiting Attentive Probing for Masked Image Modeling	Jun 11, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1
A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild	Jun 11, 2025	Age EstimationBenchmarking	CodeCode Available	0
GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments	Jun 11, 2025	Active LearningBenchmarking	—Unverified	0
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms	Jun 10, 2025	BenchmarkingGraph Attention	—Unverified	0
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data	Jun 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling	Jun 10, 2025	Benchmarking	CodeCode Available	1
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP	Jun 10, 2025	BenchmarkingSentiment Analysis	—Unverified	0
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens	Jun 10, 2025	BenchmarkingMathematical Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 12 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified