SOTAVerified

Benchmarking

Papers

Showing 14311440 of 5548 papers

TitleStatusHype
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
A Critical Assessment of State-of-the-Art in Entity AlignmentCode1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
NewsRecLib: A PyTorch-Lightning Library for Neural News RecommendationCode1
AQuA: A Benchmarking Tool for Label Quality AssessmentCode1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationCode1
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image SegmentationCode1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and BeyondCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and SolutionsCode1
Show:102550
← PrevPage 144 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified