SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1431–1440 of 5548 papers

Title	Date	Tasks	Status	Hype
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling	Jun 10, 2025	Benchmarking	CodeCode Available	1
A Critical Assessment of State-of-the-Art in Entity Alignment	Oct 30, 2020	BenchmarkingEntity Alignment	CodeCode Available	1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset	Nov 5, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1
NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation	Oct 2, 2023	BenchmarkingNews Recommendation	CodeCode Available	1
AQuA: A Benchmarking Tool for Label Quality Assessment	Jun 15, 2023	BenchmarkingLabel Error Detection	CodeCode Available	1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation	Dec 26, 2019	BenchmarkingDomain Adaptation	CodeCode Available	1
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation	Apr 15, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond	Dec 25, 2023	Animal Pose EstimationBenchmarking	CodeCode Available	1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks	Feb 4, 2023	Adversarial AttackAdversarial Robustness	CodeCode Available	1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions	Jun 26, 2025	BenchmarkingDrug Design	CodeCode Available	1

Show:10 25 50

← PrevPage 144 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified