Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 347 papers

Title	Date	Tasks	Status	Hype
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy	Feb 27, 2025	Large Language ModelMinecraft	—Unverified	0
AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Feb 27, 2025	Language ModelingLanguage Modelling	CodeCode Available	3
Introducing Visual Perception Token into Multimodal Large Language Model	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
R1-Onevision：An Open-Source Multimodal Large Language Model Capable of Deep Reasoning	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	4
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	—Unverified	0
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Towards Text-Image Interleaved Retrieval	Feb 18, 2025	Information RetrievalLanguage Modeling	CodeCode Available	1
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation	Feb 17, 2025	Language ModelingLanguage Modelling	—Unverified	0
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring	Feb 16, 2025	Instance SegmentationLanguage Modeling	—Unverified	0
Distraction is All You Need for Multimodal Large Language Model Jailbreaking	Feb 15, 2025	AllLanguage Modeling	—Unverified	0
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2
On Fairness of Unified Multimodal Large Language Model for Image Generation	Feb 5, 2025	FairnessImage Generation	—Unverified	0
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving	Feb 4, 2025	Language ModelingLanguage Modelling	—Unverified	0
Leveraging Multimodal LLM for Inspirational User Interface Search	Jan 29, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
Learning Free Token Reduction for Multi-Modal Large Language Models	Jan 29, 2025	Language ModelingLanguage Modelling	—Unverified	0
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified	0
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures	Jan 25, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
EventVL: Understand Event Streams via Multimodal Large Language Model	Jan 23, 2025	Event-based visionLanguage Modeling	—Unverified	0
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Jan 21, 2025	Image GenerationInstruction Following	CodeCode Available	3
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis	Jan 17, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Jan 16, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Jan 14, 2025	Feature CompressionLanguage Modeling	CodeCode Available	2
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Jan 14, 2025	Language ModelingLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 5 of 14Next →

No leaderboard results yet.