SOTAVerified

Benchmarking

Papers

Showing 481490 of 5548 papers

TitleStatusHype
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationCode1
CattleFace-RGBT: RGB-T Cattle Facial Landmark BenchmarkCode1
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial LabelsCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
ArtFID: Quantitative Evaluation of Neural Style TransferCode1
Restore Anything Model via Efficient Degradation AdaptationCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)Code1
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
Show:102550
← PrevPage 49 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified