Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–325 of 347 papers

Title	Date	Tasks	Status
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion	Jan 24, 2024	Conditional Image GenerationDenoising	—Unverified
Universal Item Tokenization for Transferable Generative Recommendation	Apr 6, 2025	General KnowledgeLarge Language Model	—Unverified
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning	May 20, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation	Mar 19, 2025	Language Model EvaluationLanguage Modeling	—Unverified
VGR: Visual Grounded Reasoning	Jun 13, 2025	Large Language ModelMath	—Unverified
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model	Aug 21, 2024	Emotion RecognitionLanguage Modeling	—Unverified
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese	Aug 22, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks	Feb 13, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Text Generation in the Wild	Jul 19, 2024	Language ModellingLarge Language Model	—Unverified
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation	Oct 11, 2024	DiagnosticLanguage Modeling	—Unverified
VL-Mamba: Exploring State Space Models for Multimodal Learning	Mar 20, 2024	Language ModelingLanguage Modelling	—Unverified
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	Sep 30, 2024	Anomaly DetectionLanguage Modeling	—Unverified
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks	Jul 29, 2024	Deep LearningDomain Generalization	—Unverified
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach	Oct 31, 2024	Language ModelingLanguage Modelling	—Unverified
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models	May 26, 2025	Language ModelingLanguage Modelling	—Unverified
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research	Mar 16, 2025	EEGLarge Language Model	—Unverified
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image	Dec 3, 2024	DiagnosticLanguage Modeling	—Unverified
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery	Feb 26, 2024	Continual LearningExemplar-Free	CodeCode Available
Leveraging Multimodal LLM for Inspirational User Interface Search	Jan 29, 2025	Language ModelingLanguage Modelling	CodeCode Available
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning	Nov 17, 2024	Image CaptioningLanguage Modeling	CodeCode Available
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models	May 26, 2025	image-classificationImage Classification	CodeCode Available
Consistency-aware Fake Videos Detection on Short Video Platforms	Apr 30, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion	Oct 3, 2024	Adversarial AttackDenoising	CodeCode Available
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Oct 15, 2024	HallucinationLarge Language Model	CodeCode Available

Show:10 25 50

← PrevPage 13 of 14Next →

No leaderboard results yet.