SOTAVerified

Benchmarking

Papers

Showing 34013450 of 5548 papers

TitleStatusHype
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks0
MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems0
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines0
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents0
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases0
Model Agnostic Explainable Selective Regression via Uncertainty Estimation0
Model-based trajectory stitching for improved behavioural cloning and its applications0
Model-Based Underwater 6D Pose Estimation from RGB0
ModelHub.AI: Dissemination Platform for Deep Learning Models0
Model Lakes0
Modelling Neuronal Behaviour with Time Series Regression: Recurrent Neural Networks on C. Elegans Data0
Modelling neuronal behaviour with time series regression: Recurrent Neural Networks on synthetic C. elegans data0
Modelling Regional Solar Photovoltaic Capacity in Great Britain0
Model-predictive control and reinforcement learning in multi-energy system case studies0
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities0
Modern CNNs for IoT Based Farms0
Modern, Efficient, and Differentiable Transport Equation Models using JAX: Applications to Population Balance Equations0
Modified CMA-ES Algorithm for Multi-Modal Optimization: Incorporating Niching Strategies and Dynamic Adaptation Mechanism0
ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes0
MO-IOHinspector: Anytime Benchmarking of Multi-Objective Algorithms using IOHprofiler0
MolMiner: Towards Controllable, 3D-Aware, Fragment-Based Molecular Design0
MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from Monocular RGB Videos0
Momentum Contrastive Pre-training for Question Answering0
MorisienMT: A Dataset for Mauritian Creole Machine Translation0
Morphing Attack Detection -- Database, Evaluation Platform and Benchmarking0
MORSE: Semantic-ally Drive-n MORpheme SEgment-er0
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models0
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level0
Movie Description0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking0
MozzaVID: Mozzarella Volumetric Image Dataset0
MPCLeague: Robust MPC Platform for Privacy-Preserving Machine Learning0
MRAnnotator: multi-Anatomy and many-Sequence MRI segmentation of 44 structures0
MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization0
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception0
MS MARCO: Benchmarking Ranking Models in the Large-Data Regime0
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge0
MTG: A Benchmarking Suite for Multilingual Text Generation0
MTLens: Machine Translation Output Debugging0
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Mukayese: Turkish NLP Strikes Back0
Multicalibration for Confidence Scoring in LLMs0
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking0
Multi-channel deep convolutional neural networks for multi-classifying thyroid disease0
Multiclass Optimal Classification Trees with SVM-splits0
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models0
Show:102550
← PrevPage 69 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified