MME

MME is a comprehensive evaluation benchmark for multimodal large language models. It measures both perception and cognition abilities on a total of 14 subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–95 of 95 papers

Title	Date	Tasks	Status	Hype
Temporal Reasoning Transfer from Text to Video	Oct 8, 2024	DiagnosticMME	—Unverified	0
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Oct 7, 2024	Causal Inferencecounterfactual	CodeCode Available	2
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination	Oct 6, 2024	AttributeDecoder	—Unverified	0
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions	Oct 5, 2024	BenchmarkingHallucination	CodeCode Available	0
ParGo: Bridging Vision-Language with Partial and Global Views	Aug 23, 2024	MME	CodeCode Available	1
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?	Aug 23, 2024	MME	—Unverified	0
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection	Aug 7, 2024	3D Object DetectionAutonomous Navigation	CodeCode Available	2
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment	Jul 21, 2024	MME	CodeCode Available	0
Long Context Transfer from Language to Vision	Jun 24, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified	0
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	May 31, 2024	MMEVideo MME	CodeCode Available	1
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	May 29, 2024	EgoSchemaMME	CodeCode Available	2
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models	May 28, 2024	HallucinationMME	—Unverified	0
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models	May 28, 2024	MMEObject	—Unverified	0
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models	Apr 6, 2024	MMEObject	CodeCode Available	0
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding	Mar 27, 2024	AttributeDecision Making	CodeCode Available	2
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination	Mar 21, 2024	HallucinationMME	CodeCode Available	1
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models	Mar 20, 2024	MMEVisual Question Answering	CodeCode Available	1
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise	Dec 19, 2023	MMEVisual Reasoning	—Unverified	0
Silkie: Preference Distillation for Large Visual Language Models	Dec 17, 2023	HallucinationMME	—Unverified	0
Honeybee: Locality-enhanced Projector for Multimodal LLM	Dec 11, 2023	MMEScience Question Answering	CodeCode Available	2
Prompt Highlighter: Interactive Control for Multi-Modal LLMs	Dec 7, 2023	MMEText Generation	CodeCode Available	1
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization	Nov 28, 2023	HallucinationMME	CodeCode Available	1
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions	Nov 21, 2023	DescriptiveMME	CodeCode Available	0
The Use of Symmetry for Models with Variable-size Variables	Nov 15, 2023	MME	—Unverified	0
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning	Nov 2, 2023	MMEVisual Reasoning	CodeCode Available	1
Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model	Oct 31, 2023	Autonomous DrivingLanguage Modeling	—Unverified	0
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors	Sep 29, 2023	BenchmarkingComputational Efficiency	—Unverified	0
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition	Sep 26, 2023	ArticlesImage Comprehension	CodeCode Available	0
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning	Sep 14, 2023	HallucinationIn-Context Learning	CodeCode Available	2
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2
Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts	Aug 15, 2023	AstronomyDomain Adaptation	—Unverified	0
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions	Aug 8, 2023	Caption GenerationImage Captioning	CodeCode Available	2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2
Multi-Modal Evaluation Approach for Medical Image Segmentation	Feb 8, 2023	Image SegmentationMedical Image Segmentation	—Unverified	0
MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects	Jan 1, 2023	MMEObject	CodeCode Available	0
Masked Motion Encoding for Self-Supervised Video Representation Learning	Oct 12, 2022	MMEOptical Flow Estimation	CodeCode Available	1
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution	Aug 15, 2022	Graph Neural NetworkGraph Representation Learning	CodeCode Available	0
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue	Jun 19, 2022	Dialogue EvaluationMME	—Unverified	0
Machine Learning Methods for Inferring the Number of UAV Emitters via Massive MIMO Receive Array	Mar 2, 2022	ClassificationMME	—Unverified	0
Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation	Apr 9, 2020	Domain AdaptationMeta-Learning	—Unverified	0
Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition	Aug 1, 2019	Language IdentificationMME	—Unverified	0
Deep Learning for Hybrid 5G Services in Mobile Edge Computing Systems: Learn from a Digital Twin	Jun 30, 2019	Edge-computingManagement	—Unverified	0
Scalable K-Medoids via True Error Bound and Familywise Bandits	May 27, 2019	ClusteringMME	—Unverified	0
Semi-supervised Domain Adaptation via Minimax Entropy	Apr 13, 2019	Domain AdaptationMME	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.