SOTAVerified

Multimodal Large Language Model

Papers

Showing 201250 of 347 papers

TitleStatusHype
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization0
From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models0
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders0
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models0
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model0
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes0
Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models0
GUIDE: Graphical User Interface Data for Execution0
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition0
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model0
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval0
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning0
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model0
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification0
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding0
Hybrid Agents for Image Restoration0
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance0
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems0
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks0
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics0
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models0
Investigating the Catastrophic Forgetting in Multimodal Large Language Models0
Is your multimodal large language model a good science tutor?0
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM0
KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model0
Learning Free Token Reduction for Multi-Modal Large Language Models0
LEGION: Learning to Ground and Explain for Synthetic Image Detection0
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring0
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition0
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs0
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding0
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models0
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound0
LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation0
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation0
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment0
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval0
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model0
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation0
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.