Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 347 papers

Title	Date	Tasks	Status	Hype
UrbanWorld: An Urban World Model for 3D City Generation	Jul 16, 2024	Decision MakingLanguage Modelling	CodeCode Available	2
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM	Jun 18, 2024	Anomaly DetectionAnomaly Localization	CodeCode Available	2
Explore the Limits of Omni-modal Pretraining at Scale	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
A Survey of Multimodal Large Language Model from A Data-centric Perspective	May 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
WorldGPT: Empowering LLM as Multimodal World Model	Apr 28, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Paint by Inpaint: Learning to Add Image Objects by Removing Them First	Apr 28, 2024	Image InpaintingLanguage Modeling	CodeCode Available	2
LaVy: Vietnamese Multimodal Large Language Model	Apr 11, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
UMBRAE: Unified Multimodal Brain Decoding	Apr 10, 2024	Brain DecodingLanguage Modeling	CodeCode Available	2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want	Mar 29, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast	Feb 13, 2024	Language ModellingLarge Language Model	CodeCode Available	2
Jailbreaking Attack against Multimodal Large Language Model	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Jan 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2
LLMGA: Multimodal Large Language Model based Generation Assistant	Nov 27, 2023	Image GenerationLanguage Modeling	CodeCode Available	2
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks	Jun 7, 2023	Cross-Modal RetrievalLanguage Modelling	CodeCode Available	2
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Jun 23, 2025	DiagnosticLarge Language Model	CodeCode Available	1
The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units	Jun 19, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
un^2CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP	May 30, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model	May 30, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution	May 27, 2025	8kAvg	CodeCode Available	1
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging	May 26, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	May 26, 2025	DenoisingImage Generation	CodeCode Available	1
ChemMLLM: Chemical Multimodal Large Language Model	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation	May 19, 2025	Binary ClassificationDeepFake Detection	CodeCode Available	1
Unifying Segment Anything in Microscopy with Multimodal Large Language Model	May 16, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Apr 17, 2025	Image GenerationLarge Language Model	CodeCode Available	1
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection	Apr 16, 2025	Anomaly DetectionLarge Language Model	CodeCode Available	1
Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs	Apr 10, 2025	Multimodal Large Language ModelTime Series	CodeCode Available	1
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions	Mar 20, 2025	2D Object DetectionDistributed Computing	CodeCode Available	1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space	Mar 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Towards General Visual-Linguistic Face Forgery Detection(V2)	Feb 28, 2025	HallucinationLanguage Modeling	CodeCode Available	1
Towards Text-Image Interleaved Retrieval	Feb 18, 2025	Information RetrievalLanguage Modeling	CodeCode Available	1
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures	Jan 25, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis	Jan 17, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering	Jan 1, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Dec 20, 2024	Cancer ClassificationChatbot	CodeCode Available	1
IDEA-Bench: How Far are Generative Models from Professional Designing?	Dec 16, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Dec 4, 2024	Multimodal Large Language ModelVideo Understanding	CodeCode Available	1
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning	Nov 18, 2024	AttributeCompositional Zero-Shot Learning	CodeCode Available	1
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	Nov 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation	Oct 22, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation	Oct 17, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
Hespi: A pipeline for automatically detecting information from hebarium specimen sheets	Oct 11, 2024	Handwritten Text RecognitionHTR	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 7Next →

No leaderboard results yet.