SOTAVerified|Agents Browse Leaderboard About

MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 95 papers

Title	Date	Tasks	Status	Hype
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2
Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts	Aug 15, 2023	AstronomyDomain Adaptation	—Unverified	0
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions	Aug 8, 2023	Caption GenerationImage Captioning	CodeCode Available	2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2
Multi-Modal Evaluation Approach for Medical Image Segmentation	Feb 8, 2023	Image SegmentationMedical Image Segmentation	—Unverified	0
MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects	Jan 1, 2023	MMEObject	CodeCode Available	0
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution	Aug 15, 2022	Graph Neural NetworkGraph Representation Learning	CodeCode Available	0
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue	Jun 19, 2022	Dialogue EvaluationMME	—Unverified	0
Machine Learning Methods for Inferring the Number of UAV Emitters via Massive MIMO Receive Array	Mar 2, 2022	ClassificationMME	—Unverified	0

Show:10 25 50

← PrevPage 9 of 10Next →

No leaderboard results yet.