Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 347 papers

Title	Date	Tasks	Status	Hype
TextToucher: Fine-Grained Text-to-Touch Generation	Sep 9, 2024	Language ModellingLarge Language Model	CodeCode Available	1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding	Aug 21, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation	Aug 19, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Aug 19, 2024	DescriptiveFace Swapping	CodeCode Available	1
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Aug 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model	Jul 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
A Refer-and-Ground Multimodal Large Language Model for Biomedicine	Jun 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution	Jun 24, 2024	Image RestorationImage Super-Resolution	CodeCode Available	1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model	Jun 3, 2024	Image OutpaintingLanguage Modeling	CodeCode Available	1
Voice Jailbreak Attacks Against GPT-4o	May 29, 2024	Language ModellingLarge Language Model	CodeCode Available	1
From Text to Pixel: Advancing Long-Context Understanding in MLLMs	May 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models	Apr 1, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception	Mar 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework	Dec 31, 2023	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model	Dec 12, 2023	Contrastive LearningHallucination	CodeCode Available	1
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Nov 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Chain of Images for Intuitively Reasoning	Nov 9, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V	Oct 29, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images	Oct 22, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	Oct 8, 2023	DecoderLanguage Modeling	CodeCode Available	1
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis	Jul 31, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Kosmos-2: Grounding Multimodal Large Language Models to the World	Jun 26, 2023	Image CaptioningIn-Context Learning	CodeCode Available	1
LMEye: An Interactive Perception Network for Large Language Models	May 5, 2023	Language ModellingLarge Language Model	CodeCode Available	1
LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer	Jul 15, 2025	DiagnosticLarge Language Model	—Unverified	0
MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection	Jul 15, 2025	Fire DetectionImage Generation	CodeCode Available	0
KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model	Jul 15, 2025	Keypoint DetectionLanguage Modeling	—Unverified	0
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jul 14, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model	Jul 8, 2025	Language ModelingLanguage Modelling	—Unverified	0
BlueLM-2.5-3B Technical Report	Jul 8, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step	Jul 6, 2025	DenoisingLarge Language Model	—Unverified	0
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval	Jun 28, 2025	Cross-Modal RetrievalImage Captioning	—Unverified	0
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography	Jun 26, 2025	DeciphermentLarge Language Model	CodeCode Available	0
DreamJourney: Perpetual View Generation with Video Diffusion Models	Jun 21, 2025	Image to 3DLarge Language Model	—Unverified	0
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM	Jun 17, 2025	HallucinationLanguage Modeling	—Unverified	0
VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation	Jun 16, 2025	Data VisualizationLanguage Modeling	CodeCode Available	0
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model	Jun 16, 2025	Decision MakingFinancial Analysis	—Unverified	0
VGR: Visual Grounded Reasoning	Jun 13, 2025	Large Language ModelMath	—Unverified	0
PHRASED: Phrase Dictionary Biasing for Speech Translation	Jun 10, 2025	Language ModelingLanguage Modelling	—Unverified	0
Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin	Jun 5, 2025	Large Language ModelMorphological Analysis	—Unverified	0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques	Jun 5, 2025	cross-modal alignmentLarge Language Model	—Unverified	0
The NTNU System at the S&I Challenge 2025 SLA Open Track	Jun 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions	Jun 4, 2025	Data AugmentationDiversity	—Unverified	0
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models	Jun 2, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation	May 30, 2025	Autonomous DrivingAutonomous Vehicles	—Unverified	0
Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation	May 28, 2025	Image GenerationLanguage Modeling	CodeCode Available	0
Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation	May 27, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.