Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 347 papers

Title	Date	Tasks	Status	Hype	Score
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception	Mar 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1	5
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V	Oct 29, 2023	DiagnosticLanguage Modeling	CodeCode Available	1	5
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	Nov 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs	Apr 10, 2025	Multimodal Large Language ModelTime Series	CodeCode Available	1	5
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation	Oct 17, 2024	Decision MakingLanguage Modeling	CodeCode Available	1	5
Unifying Segment Anything in Microscopy with Multimodal Large Language Model	May 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis	Jan 17, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1	5
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Nov 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	1	5
Voice Jailbreak Attacks Against GPT-4o	May 29, 2024	Language ModellingLarge Language Model	CodeCode Available	1	5
Chain of Images for Intuitively Reasoning	Nov 9, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	1	5
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Dec 20, 2024	Cancer ClassificationChatbot	CodeCode Available	1	5
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Jun 23, 2025	DiagnosticLarge Language Model	CodeCode Available	1	5
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework	Dec 31, 2023	Large Language ModelMultimodal Large Language Model	CodeCode Available	1	5
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	Oct 8, 2023	DecoderLanguage Modeling	CodeCode Available	1	5
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation	Oct 22, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1	5
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model	Jun 3, 2024	Image OutpaintingLanguage Modeling	CodeCode Available	1	5
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Aug 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions	Mar 20, 2025	2D Object DetectionDistributed Computing	CodeCode Available	1	5
LMEye: An Interactive Perception Network for Large Language Models	May 5, 2023	Language ModellingLarge Language Model	CodeCode Available	1	5
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1	5
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models	Apr 1, 2024	Decision MakingLanguage Modeling	CodeCode Available	1	5
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning	Nov 18, 2024	AttributeCompositional Zero-Shot Learning	CodeCode Available	1	5
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations	Oct 22, 2024	Camouflaged Object SegmentationLarge Language Model	CodeCode Available	0	5
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities	Apr 2, 2025	DescriptiveLarge Language Model	CodeCode Available	0	5
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Apr 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	0	5
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models	May 26, 2025	image-classificationImage Classification	CodeCode Available	0	5
TRINS: Towards Multimodal Language Models that Can Read	Jun 10, 2024	Language ModelingLanguage Modelling	CodeCode Available	0	5
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering	Nov 9, 2024	Information RetrievalLanguage Modeling	CodeCode Available	0	5
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Aug 30, 2024	Language ModellingLarge Language Model	CodeCode Available	0	5
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	CodeCode Available	0	5
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation	May 28, 2025	Image GenerationLanguage Modeling	CodeCode Available	0	5
Consistency-aware Fake Videos Detection on Short Video Platforms	Apr 30, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	0	5
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion	Oct 3, 2024	Adversarial AttackDenoising	CodeCode Available	0	5
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available	0	5
Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning	May 10, 2025	Image AugmentationLarge Language Model	CodeCode Available	0	5
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks	Mar 6, 2025	document understandingLanguage Modeling	CodeCode Available	0	5
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography	Jun 26, 2025	DeciphermentLarge Language Model	CodeCode Available	0	5
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	CodeCode Available	0	5
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering	Dec 19, 2024	Contrastive LearningLanguage Modeling	CodeCode Available	0	5
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Oct 15, 2024	HallucinationLarge Language Model	CodeCode Available	0	5
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	CodeCode Available	0	5
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	Jul 4, 2023	document understandingLanguage Modeling	CodeCode Available	0	5
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model	Nov 30, 2023	Language ModelingLanguage Modelling	CodeCode Available	0	5
A Survey on Multimodal Large Language Models	Jun 23, 2023	HallucinationIn-Context Learning	CodeCode Available	0	5
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available	0	5

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.