SOTAVerified

Benchmarking

Papers

Showing 721730 of 5548 papers

TitleStatusHype
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
Delving into Out-of-Distribution Detection with Medical Vision-Language ModelsCode1
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergenceCode1
End-to-end Emotion-Cause Pair Extraction via Learning to LinkCode1
DependEval: Benchmarking LLMs for Repository Dependency UnderstandingCode1
Bag of Tricks for Adversarial TrainingCode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
Show:102550
← PrevPage 73 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified