SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1521–1530 of 5548 papers

Title	Date	Tasks	Status	Hype
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks	Jul 3, 2025	BenchmarkingCode Generation	—Unverified	0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction	Jul 3, 2025	Benchmarking	CodeCode Available	0
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation	Jul 1, 2025	BenchmarkingMachine Translation	—Unverified	0
State and Memory is All You Need for Robust and Reliable AI Agents	Jun 30, 2025	AllBenchmarking	—Unverified	0
Point Cloud Compression and Objective Quality Assessment: A Survey	Jun 28, 2025	Autonomous DrivingBenchmarking	—Unverified	0
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation	Jun 26, 2025	BenchmarkingTransfer Learning	CodeCode Available	0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Jun 26, 2025	Anomaly DetectionBenchmarking	CodeCode Available	0
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation	Jun 26, 2025	AttributeBenchmarking	—Unverified	0
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Jun 26, 2025	Benchmarking	—Unverified	0
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization	Jun 25, 2025	BenchmarkingContrastive Learning	—Unverified	0

Show:10 25 50

← PrevPage 153 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified