SOTAVerified

Benchmarking

Papers

Showing 25112520 of 5548 papers

TitleStatusHype
Dialogue Quality and Emotion Annotations for Customer Support ConversationsCode0
From raw affiliations to organization identifiersCode0
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological EngineeringCode0
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language RepresentationCode0
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in HistopathologyCode0
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code RepositoriesCode0
Benchmarking pre-trained text embedding models in aligning built asset informationCode0
Benchmarking Intersectional Biases in NLPCode0
DFEE: Interactive DataFlow Execution and Evaluation KitCode0
Show:102550
← PrevPage 252 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified