Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 347 papers

Title	Date	Tasks	Status
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips	Oct 1, 2023	Language ModelingLanguage Modelling	—Unverified
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step	Jul 6, 2025	DenoisingLarge Language Model	—Unverified
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model	Nov 19, 2024	Information RetrievalLanguage Modeling	—Unverified
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Sep 18, 2024	Image CaptioningLarge Language Model	—Unverified
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Dec 10, 2024	Image GenerationLanguage Modelling	—Unverified
Distraction is All You Need for Multimodal Large Language Model Jailbreaking	Feb 15, 2025	AllLanguage Modeling	—Unverified
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing	Sep 2, 2024	Image GenerationLanguage Modelling	—Unverified
DreamJourney: Perpetual View Generation with Video Diffusion Models	Jun 21, 2025	Image to 3DLarge Language Model	—Unverified
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Dec 4, 2024	Image GenerationLarge Language Model	—Unverified
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Dec 12, 2024	Image ComprehensionImage Generation	—Unverified
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM	Dec 5, 2024	Image ManipulationLanguage Modeling	—Unverified
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Aug 21, 2024	Computational EfficiencyLanguage Modeling	—Unverified
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak	May 30, 2024	Language ModelingLanguage Modelling	—Unverified
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Dec 5, 2024	Language ModelingLanguage Modelling	—Unverified
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model	Dec 5, 2023	Boundary DetectionLanguage Modeling	—Unverified
EventVL: Understand Event Streams via Multimodal Large Language Model	Jan 23, 2025	Event-based visionLanguage Modeling	—Unverified
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	—Unverified
FaceInsight: A Multimodal Large Language Model for Face Perception	Apr 22, 2025	Language ModelingLanguage Modelling	—Unverified
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning	Apr 9, 2025	Action Unit DetectionAge Estimation	—Unverified
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models	Jun 2, 2025	Large Language ModelMultimodal Large Language Model	—Unverified
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing	Jul 8, 2024	Image GenerationLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 9 of 14Next →

No leaderboard results yet.