SOTAVerified

Benchmarking

Papers

Showing 34413450 of 5548 papers

TitleStatusHype
MTG: A Benchmarking Suite for Multilingual Text Generation0
MTLens: Machine Translation Output Debugging0
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Mukayese: Turkish NLP Strikes Back0
Multicalibration for Confidence Scoring in LLMs0
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking0
Multi-channel deep convolutional neural networks for multi-classifying thyroid disease0
Multiclass Optimal Classification Trees with SVM-splits0
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models0
Show:102550
← PrevPage 345 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified