SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 5548 papers

Title	Date	Tasks	Status	Hype
State and Memory is All You Need for Robust and Reliable AI Agents	Jun 30, 2025	AllBenchmarking	—Unverified	0
Point Cloud Compression and Objective Quality Assessment: A Survey	Jun 28, 2025	Autonomous DrivingBenchmarking	—Unverified	0
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation	Jun 26, 2025	AttributeBenchmarking	—Unverified	0
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Jun 26, 2025	Benchmarking	—Unverified	0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Jun 26, 2025	Anomaly DetectionBenchmarking	CodeCode Available	0
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation	Jun 26, 2025	BenchmarkingTransfer Learning	CodeCode Available	0
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions	Jun 26, 2025	BenchmarkingDrug Design	CodeCode Available	1
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available	0
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization	Jun 25, 2025	BenchmarkingContrastive Learning	—Unverified	0
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans	Jun 25, 2025	Action DetectionBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 4 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified