SOTAVerified

Benchmarking

Papers

Showing 13711380 of 5548 papers

TitleStatusHype
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)Code1
RGB-D Indiscernible Object Counting in Underwater ScenesCode1
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language ModelsCode1
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip TrainingCode1
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement LearningCode1
IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language UnderstandingCode1
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
Benchmark on Drug Target Interaction Modeling from a Structure PerspectiveCode1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on InequalitiesCode1
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experimentsCode1
Show:102550
← PrevPage 138 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified