Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 347 papers

Title	Date	Tasks	Status	Hype
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy	Feb 27, 2025	Large Language ModelMinecraft	—Unverified	0
AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Feb 27, 2025	Language ModelingLanguage Modelling	CodeCode Available	3
Introducing Visual Perception Token into Multimodal Large Language Model	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
R1-Onevision：An Open-Source Multimodal Large Language Model Capable of Deep Reasoning	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	4
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models	Feb 22, 2025	document understandingKey Information Extraction	CodeCode Available	0
Towards Text-Image Interleaved Retrieval	Feb 18, 2025	Information RetrievalLanguage Modeling	CodeCode Available	1
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation	Feb 17, 2025	Language ModelingLanguage Modelling	—Unverified	0
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring	Feb 16, 2025	Instance SegmentationLanguage Modeling	—Unverified	0
Distraction is All You Need for Multimodal Large Language Model Jailbreaking	Feb 15, 2025	AllLanguage Modeling	—Unverified	0
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2
On Fairness of Unified Multimodal Large Language Model for Image Generation	Feb 5, 2025	FairnessImage Generation	—Unverified	0
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving	Feb 4, 2025	Language ModelingLanguage Modelling	—Unverified	0
Leveraging Multimodal LLM for Inspirational User Interface Search	Jan 29, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
Learning Free Token Reduction for Multi-Modal Large Language Models	Jan 29, 2025	Language ModelingLanguage Modelling	—Unverified	0
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures	Jan 25, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified	0
EventVL: Understand Event Streams via Multimodal Large Language Model	Jan 23, 2025	Event-based visionLanguage Modeling	—Unverified	0
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Jan 21, 2025	Image GenerationInstruction Following	CodeCode Available	3
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Jan 20, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis	Jan 17, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Jan 16, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Jan 14, 2025	Feature CompressionLanguage Modeling	CodeCode Available	2
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Jan 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding	Jan 14, 2025	image-classificationImage Classification	CodeCode Available	2
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Jan 14, 2025	Language ModelingLanguage Modelling	—Unverified	0
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Jan 11, 2025	Chart UnderstandingCode Generation	CodeCode Available	2
Valley2: Exploring Multimodal Models with Scalable Vision-Language Design	Jan 10, 2025	Image CaptioningLanguage Modeling	CodeCode Available	3
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction	Jan 10, 2025	Instruction FollowingLanguage Modeling	—Unverified	0
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding	Jan 9, 2025	Language ModelingLanguage Modelling	—Unverified	0
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models	Jan 3, 2025	Binary ClassificationFace Anti-Spoofing	—Unverified	0
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model	Jan 1, 2025	AttributeLanguage Modeling	—Unverified	0
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering	Jan 1, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation	Jan 1, 2025	Autonomous DrivingAutonomous Vehicles	—Unverified	0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform	Jan 1, 2025	Code GenerationImage Generation	—Unverified	0
ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming	Dec 28, 2024	Language ModelingLanguage Modelling	—Unverified	0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Dec 27, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	0
A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization	Dec 27, 2024	Face SwappingImage Segmentation	—Unverified	0
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults	Dec 22, 2024	Data AugmentationFault Diagnosis	—Unverified	0
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Dec 20, 2024	Cancer ClassificationChatbot	CodeCode Available	1
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM	Dec 20, 2024	Language ModelingLanguage Modelling	—Unverified	0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering	Dec 19, 2024	Contrastive LearningLanguage Modeling	CodeCode Available	0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation	Dec 17, 2024	Language ModelingLanguage Modelling	—Unverified	0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	Dec 16, 2024	Language ModelingLanguage Modelling	—Unverified	0
IDEA-Bench: How Far are Generative Models from Professional Designing?	Dec 16, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond	Dec 16, 2024	Language ModelingLanguage Modelling	—Unverified	0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Dec 12, 2024	Image ComprehensionImage Generation	—Unverified	0
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Dec 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework	Dec 11, 2024	GPULanguage Modeling	—Unverified	0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Dec 10, 2024	Image GenerationLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 3 of 7Next →

No leaderboard results yet.