Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 347 papers

Title	Date	Tasks	Status	Hype
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units	Jun 19, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs	Apr 10, 2025	Multimodal Large Language ModelTime Series	CodeCode Available	1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space	Mar 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Unifying Segment Anything in Microscopy with Multimodal Large Language Model	May 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation	Aug 19, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Nov 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Hespi: A pipeline for automatically detecting information from hebarium specimen sheets	Oct 11, 2024	Handwritten Text RecognitionHTR	CodeCode Available	1
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering	Jan 1, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Chain of Images for Intuitively Reasoning	Nov 9, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	May 26, 2025	DenoisingImage Generation	CodeCode Available	1
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	Nov 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation	Oct 17, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Dec 20, 2024	Cancer ClassificationChatbot	CodeCode Available	1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models	Apr 1, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V	Oct 29, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Aug 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions	Mar 20, 2025	2D Object DetectionDistributed Computing	CodeCode Available	1
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Jun 23, 2025	DiagnosticLarge Language Model	CodeCode Available	1
LMEye: An Interactive Perception Network for Large Language Models	May 5, 2023	Language ModellingLarge Language Model	CodeCode Available	1
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation	Oct 22, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception	Mar 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures	Jan 25, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Jan 16, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Oct 24, 2024	image-classificationImage Classification	—Unverified	0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Dec 10, 2024	Image GenerationLanguage Modelling	—Unverified	0
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Apr 14, 2025	Language ModelingLanguage Modelling	—Unverified	0
Can Multimodal Large Language Model Think Analogically?	Nov 2, 2024	Language ModelingLanguage Modelling	—Unverified	0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	Dec 16, 2024	Language ModelingLanguage Modelling	—Unverified	0
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems	Aug 20, 2023	Emotion RecognitionLanguage Modelling	—Unverified	0
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Dec 9, 2024	Image GenerationLanguage Modeling	—Unverified	0
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Sep 18, 2024	Image CaptioningLarge Language Model	—Unverified	0
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring	May 20, 2025	Automated Essay ScoringDiversity	—Unverified	0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability	May 23, 2024	cross-modal alignmentLanguage Modelling	—Unverified	0
Hybrid Agents for Image Restoration	Mar 13, 2025	Image RestorationIn-Context Learning	—Unverified	0
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified	0
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification	May 21, 2025	Data AugmentationLarge Language Model	—Unverified	0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified	0
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches	Sep 26, 2024	Language ModelingLanguage Modelling	—Unverified	0
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model	Nov 10, 2023	Image CaptioningLanguage Modeling	—Unverified	0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified	0
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	May 23, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model	Nov 19, 2024	Information RetrievalLanguage Modeling	—Unverified	0
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM	Jun 17, 2025	HallucinationLanguage Modeling	—Unverified	0
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval	May 21, 2025	AttributeImage Retrieval	—Unverified	0
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model	Mar 17, 2025	Language ModelingLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.