Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 347 papers

Title	Date	Tasks	Status
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models	Jul 26, 2024	DisentanglementLanguage Modeling	—Unverified
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model	Jan 1, 2025	AttributeLanguage Modeling	—Unverified
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes	May 26, 2025	DeepFake DetectionFace Generation	—Unverified
Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models	Jun 24, 2024	Language ModelingLanguage Modelling	—Unverified
GUIDE: Graphical User Interface Data for Execution	Apr 9, 2024	Language ModellingLarge Language Model	—Unverified
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition	Mar 22, 2024	Language ModellingLarge Language Model	—Unverified
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model	Mar 17, 2025	Language ModelingLanguage Modelling	—Unverified
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval	May 21, 2025	AttributeImage Retrieval	—Unverified
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	May 23, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model	Nov 10, 2023	Image CaptioningLanguage Modeling	—Unverified
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification	May 21, 2025	Data AugmentationLarge Language Model	—Unverified
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified
Hybrid Agents for Image Restoration	Mar 13, 2025	Image RestorationIn-Context Learning	—Unverified
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Dec 9, 2024	Image GenerationLanguage Modeling	—Unverified
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems	Aug 20, 2023	Emotion RecognitionLanguage Modelling	—Unverified
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Apr 14, 2025	Language ModelingLanguage Modelling	—Unverified
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Oct 24, 2024	image-classificationImage Classification	—Unverified
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Jan 16, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models	Jan 3, 2025	Binary ClassificationFace Anti-Spoofing	—Unverified
Investigating the Catastrophic Forgetting in Multimodal Large Language Models	Sep 19, 2023	image-classificationImage Classification	—Unverified
Is your multimodal large language model a good science tutor?	May 9, 2025	Language ModelingLanguage Modelling	—Unverified
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM	Dec 20, 2024	Language ModelingLanguage Modelling	—Unverified
KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model	Jul 15, 2025	Keypoint DetectionLanguage Modeling	—Unverified
Language Is Not All You Need: Aligning Perception with Language Models	Feb 27, 2023	AllImage Captioning	—Unverified
Learning Free Token Reduction for Multi-Modal Large Language Models	Jan 29, 2025	Language ModelingLanguage Modelling	—Unverified
LEGION: Learning to Ground and Explain for Synthetic Image Detection	Mar 19, 2025	Artifact DetectionImage Manipulation	—Unverified
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring	Feb 16, 2025	Instance SegmentationLanguage Modeling	—Unverified
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition	Mar 10, 2025	Disaster ResponseLarge Language Model	—Unverified
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning	May 22, 2025	Language ModelingLanguage Modelling	—Unverified
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs	Jan 29, 2024	Language ModellingLarge Language Model	—Unverified
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding	Jan 9, 2025	Language ModelingLanguage Modelling	—Unverified
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models	Jul 27, 2024	Language ModelingLanguage Modelling	—Unverified
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Oct 19, 2024	Instruction FollowingKnowledge Distillation	—Unverified
LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer	Jul 15, 2025	DiagnosticLarge Language Model	—Unverified
Lumos : Empowering Multimodal LLMs with Scene Text Recognition	Feb 12, 2024	Language ModelingLanguage Modelling	—Unverified
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation	Dec 17, 2024	Language ModelingLanguage Modelling	—Unverified
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation	Dec 24, 2023	Common Sense ReasoningLanguage Modeling	—Unverified
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment	Apr 10, 2025	AI AgentAttribute	—Unverified
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval	Jun 28, 2025	Cross-Modal RetrievalImage Captioning	—Unverified
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model	Aug 22, 2024	Language ModelingLanguage Modelling	—Unverified
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model	Apr 14, 2025	Computational EfficiencyLanguage Modeling	—Unverified
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model	Nov 19, 2024	Language ModelingLanguage Modelling	—Unverified
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation	Dec 4, 2023	Instruction FollowingLanguage Modeling	—Unverified
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery	Feb 28, 2024	Knowledge DistillationLanguage Modeling	—Unverified
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling	May 21, 2025	Emotion RecognitionFace Detection	—Unverified
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction	Jan 10, 2025	Instruction FollowingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 6 of 7Next →

No leaderboard results yet.