Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–347 of 347 papers

Title	Date	Tasks	Status
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion	Jan 24, 2024	Conditional Image GenerationDenoising	—Unverified
Universal Item Tokenization for Transferable Generative Recommendation	Apr 6, 2025	General KnowledgeLarge Language Model	—Unverified
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning	May 20, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation	Mar 19, 2025	Language Model EvaluationLanguage Modeling	—Unverified
VGR: Visual Grounded Reasoning	Jun 13, 2025	Large Language ModelMath	—Unverified
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model	Aug 21, 2024	Emotion RecognitionLanguage Modeling	—Unverified
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese	Aug 22, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks	Feb 13, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Text Generation in the Wild	Jul 19, 2024	Language ModellingLarge Language Model	—Unverified
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation	Oct 11, 2024	DiagnosticLanguage Modeling	—Unverified
VL-Mamba: Exploring State Space Models for Multimodal Learning	Mar 20, 2024	Language ModelingLanguage Modelling	—Unverified
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	Sep 30, 2024	Anomaly DetectionLanguage Modeling	—Unverified
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks	Jul 29, 2024	Deep LearningDomain Generalization	—Unverified
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach	Oct 31, 2024	Language ModelingLanguage Modelling	—Unverified
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models	May 26, 2025	Language ModelingLanguage Modelling	—Unverified
When neural implant meets multimodal LLM: A dual-loop system for neuromodulation and naturalistic neuralbehavioral research	Mar 16, 2025	EEGLarge Language Model	—Unverified
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image	Dec 3, 2024	DiagnosticLanguage Modeling	—Unverified
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery	Feb 26, 2024	Continual LearningExemplar-Free	CodeCode Available
Leveraging Multimodal LLM for Inspirational User Interface Search	Jan 29, 2025	Language ModelingLanguage Modelling	CodeCode Available
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning	Nov 17, 2024	Image CaptioningLanguage Modeling	CodeCode Available
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models	May 26, 2025	image-classificationImage Classification	CodeCode Available
Consistency-aware Fake Videos Detection on Short Video Platforms	Apr 30, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion	Oct 3, 2024	Adversarial AttackDenoising	CodeCode Available
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Oct 15, 2024	HallucinationLarge Language Model	CodeCode Available
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography	Jun 26, 2025	DeciphermentLarge Language Model	CodeCode Available
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering	Nov 9, 2024	Information RetrievalLanguage Modeling	CodeCode Available
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering	Dec 19, 2024	Contrastive LearningLanguage Modeling	CodeCode Available
VideoQA in the Era of LLMs: An Empirical Study	Aug 8, 2024	Multimodal Large Language ModelVideo Question Answering	CodeCode Available
MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Apr 9, 2025	Autonomous DrivingLanguage Modeling	CodeCode Available
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation	May 28, 2025	Image GenerationLanguage Modeling	CodeCode Available
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations	Oct 22, 2024	Camouflaged Object SegmentationLarge Language Model	CodeCode Available
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities	Apr 2, 2025	DescriptiveLarge Language Model	CodeCode Available
Layout Generation Agents with Large Language Models	May 13, 2024	Language ModelingLanguage Modelling	CodeCode Available
TRINS: Towards Multimodal Language Models that Can Read	Jun 10, 2024	Language ModelingLanguage Modelling	CodeCode Available
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO	May 19, 2025	DecoderImage Generation	CodeCode Available
Dynamic Pyramid Network for Efficient Multimodal Large Language Model	Mar 26, 2025	Language ModelingLanguage Modelling	CodeCode Available
MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection	Jul 15, 2025	Fire DetectionImage Generation	CodeCode Available
VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation	Jun 16, 2025	Data VisualizationLanguage Modeling	CodeCode Available
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model	May 28, 2024	Language ModelingLanguage Modelling	CodeCode Available
Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning	May 10, 2025	Image AugmentationLarge Language Model	CodeCode Available
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM	May 24, 2024	Language ModellingLarge Language Model	CodeCode Available
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Aug 30, 2024	Language ModellingLarge Language Model	CodeCode Available
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation	Sep 29, 2024	Language ModelingLanguage Modelling	CodeCode Available

Show:10 25 50

← PrevPage 7 of 7Next →

No leaderboard results yet.