SOTAVerified

Benchmarking

Papers

Showing 34263450 of 5548 papers

TitleStatusHype
MorisienMT: A Dataset for Mauritian Creole Machine Translation0
Morphing Attack Detection -- Database, Evaluation Platform and Benchmarking0
MORSE: Semantic-ally Drive-n MORpheme SEgment-er0
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models0
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level0
Movie Description0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking0
MozzaVID: Mozzarella Volumetric Image Dataset0
MPCLeague: Robust MPC Platform for Privacy-Preserving Machine Learning0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization0
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception0
MS MARCO: Benchmarking Ranking Models in the Large-Data Regime0
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge0
MTG: A Benchmarking Suite for Multilingual Text Generation0
MTLens: Machine Translation Output Debugging0
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Mukayese: Turkish NLP Strikes Back0
Multicalibration for Confidence Scoring in LLMs0
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking0
Multi-channel deep convolutional neural networks for multi-classifying thyroid disease0
Multiclass Optimal Classification Trees with SVM-splits0
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models0
Show:102550
← PrevPage 138 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified