Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 347 papers

Title	Date	Tasks	Status	Hype	Score
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents	May 21, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	2	5
UrbanWorld: An Urban World Model for 3D City Generation	Jul 16, 2024	Decision MakingLanguage Modelling	CodeCode Available	2	5
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model	Mar 6, 2025	General KnowledgeImage Captioning	CodeCode Available	2	5
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Dec 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Jan 11, 2025	Chart UnderstandingCode Generation	CodeCode Available	2	5
Jailbreaking Attack against Multimodal Large Language Model	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Paint by Inpaint: Learning to Add Image Objects by Removing Them First	Apr 28, 2024	Image InpaintingLanguage Modeling	CodeCode Available	2	5
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer	Apr 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want	Mar 29, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2	5
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Jul 19, 2024	AttributeLanguage Modeling	CodeCode Available	2	5
Introducing Visual Perception Token into Multimodal Large Language Model	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding	Jan 14, 2025	image-classificationImage Classification	CodeCode Available	2	5
A Survey of Multimodal Large Language Model from A Data-centric Perspective	May 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	Nov 11, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	2	5
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2	5
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2	5
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks	Jun 7, 2023	Cross-Modal RetrievalLanguage Modelling	CodeCode Available	2	5
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering	Jan 1, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1	5
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space	Mar 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	Nov 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution	Jun 24, 2024	Image RestorationImage Super-Resolution	CodeCode Available	1	5
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	May 26, 2025	DenoisingImage Generation	CodeCode Available	1	5
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images	Oct 22, 2023	DiagnosticLanguage Modeling	CodeCode Available	1	5
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation	May 19, 2025	Binary ClassificationDeepFake Detection	CodeCode Available	1	5
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception	Mar 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Apr 17, 2025	Image GenerationLarge Language Model	CodeCode Available	1	5
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1	5
A Refer-and-Ground Multimodal Large Language Model for Biomedicine	Jun 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V	Oct 29, 2023	DiagnosticLanguage Modeling	CodeCode Available	1	5
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Dec 20, 2024	Cancer ClassificationChatbot	CodeCode Available	1	5
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation	Oct 17, 2024	Decision MakingLanguage Modeling	CodeCode Available	1	5
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Jun 23, 2025	DiagnosticLarge Language Model	CodeCode Available	1	5
From Text to Pixel: Advancing Long-Context Understanding in MLLMs	May 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
LMEye: An Interactive Perception Network for Large Language Models	May 5, 2023	Language ModellingLarge Language Model	CodeCode Available	1	5
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis	Jul 31, 2023	Language ModelingLanguage Modelling	CodeCode Available	1	5
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Aug 19, 2024	DescriptiveFace Swapping	CodeCode Available	1	5
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation	Oct 22, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1	5
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution	May 27, 2025	8kAvg	CodeCode Available	1	5
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection	Apr 16, 2025	Anomaly DetectionLarge Language Model	CodeCode Available	1	5
ChemMLLM: Chemical Multimodal Large Language Model	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available	1	5
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models	Apr 1, 2024	Decision MakingLanguage Modeling	CodeCode Available	1	5
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1	5
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Nov 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	1	5
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures	Jan 25, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1	5

Show:10 25 50

← PrevPage 2 of 7Next →

No leaderboard results yet.