Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 347 papers

Title	Date	Tasks	Status
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips	Oct 1, 2023	Language ModelingLanguage Modelling	—Unverified
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step	Jul 6, 2025	DenoisingLarge Language Model	—Unverified
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model	Nov 19, 2024	Information RetrievalLanguage Modeling	—Unverified
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Sep 18, 2024	Image CaptioningLarge Language Model	—Unverified
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Dec 10, 2024	Image GenerationLanguage Modelling	—Unverified
Distraction is All You Need for Multimodal Large Language Model Jailbreaking	Feb 15, 2025	AllLanguage Modeling	—Unverified
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing	Sep 2, 2024	Image GenerationLanguage Modelling	—Unverified
DreamJourney: Perpetual View Generation with Video Diffusion Models	Jun 21, 2025	Image to 3DLarge Language Model	—Unverified
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Dec 4, 2024	Image GenerationLarge Language Model	—Unverified
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Dec 12, 2024	Image ComprehensionImage Generation	—Unverified
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM	Dec 5, 2024	Image ManipulationLanguage Modeling	—Unverified
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Aug 21, 2024	Computational EfficiencyLanguage Modeling	—Unverified
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak	May 30, 2024	Language ModelingLanguage Modelling	—Unverified
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Dec 5, 2024	Language ModelingLanguage Modelling	—Unverified
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model	Dec 5, 2023	Boundary DetectionLanguage Modeling	—Unverified
EventVL: Understand Event Streams via Multimodal Large Language Model	Jan 23, 2025	Event-based visionLanguage Modeling	—Unverified
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	—Unverified
FaceInsight: A Multimodal Large Language Model for Face Perception	Apr 22, 2025	Language ModelingLanguage Modelling	—Unverified
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning	Apr 9, 2025	Action Unit DetectionAge Estimation	—Unverified
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models	Jun 2, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing	Jul 8, 2024	Image GenerationLanguage Modeling	—Unverified
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing	Mar 16, 2025	Change DetectionImage Captioning	—Unverified
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models	Jul 26, 2024	DisentanglementLanguage Modeling	—Unverified
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model	Jan 1, 2025	AttributeLanguage Modeling	—Unverified
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes	May 26, 2025	DeepFake DetectionFace Generation	—Unverified
Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models	Jun 24, 2024	Language ModelingLanguage Modelling	—Unverified
GUIDE: Graphical User Interface Data for Execution	Apr 9, 2024	Language ModellingLarge Language Model	—Unverified
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition	Mar 22, 2024	Language ModellingLarge Language Model	—Unverified
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model	Mar 17, 2025	Language ModelingLanguage Modelling	—Unverified
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval	May 21, 2025	AttributeImage Retrieval	—Unverified
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	May 23, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model	Nov 10, 2023	Image CaptioningLanguage Modeling	—Unverified
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification	May 21, 2025	Data AugmentationLarge Language Model	—Unverified
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified
Hybrid Agents for Image Restoration	Mar 13, 2025	Image RestorationIn-Context Learning	—Unverified
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Dec 9, 2024	Image GenerationLanguage Modeling	—Unverified
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems	Aug 20, 2023	Emotion RecognitionLanguage Modelling	—Unverified
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Apr 14, 2025	Language ModelingLanguage Modelling	—Unverified
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Oct 24, 2024	image-classificationImage Classification	—Unverified
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Jan 16, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models	Jan 3, 2025	Binary ClassificationFace Anti-Spoofing	—Unverified
Investigating the Catastrophic Forgetting in Multimodal Large Language Models	Sep 19, 2023	image-classificationImage Classification	—Unverified
Is your multimodal large language model a good science tutor?	May 9, 2025	Language ModelingLanguage Modelling	—Unverified

Show:10 25 50

← PrevPage 5 of 7Next →

No leaderboard results yet.