SOTAVerified

Multimodal Large Language Model

Papers

Showing 201225 of 347 papers

TitleStatusHype
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips0
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step0
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic0
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation0
Distraction is All You Need for Multimodal Large Language Model Jailbreaking0
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing0
DreamJourney: Perpetual View Generation with Video Diffusion Models0
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation0
EAGLE: Egocentric AGgregated Language-video Engine0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM0
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM0
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model0
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak0
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios0
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model0
EventVL: Understand Event Streams via Multimodal Large Language Model0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling0
FaceInsight: A Multimodal Large Language Model for Face Perception0
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning0
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization0
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models0
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.