SOTAVerified

Multimodal Large Language Model

Papers

Showing 251300 of 347 papers

TitleStatusHype
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM0
KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model0
Language Is Not All You Need: Aligning Perception with Language Models0
Learning Free Token Reduction for Multi-Modal Large Language Models0
LEGION: Learning to Ground and Explain for Synthetic Image Detection0
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring0
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition0
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs0
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding0
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models0
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound0
LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation0
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation0
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment0
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval0
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model0
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation0
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery0
MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization0
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning0
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation0
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
MLLMReID: Multimodal Large Language Model-based Person Re-identification0
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal0
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation0
MobileFlow: A Multimodal LLM For Mobile GUI Agent0
MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description0
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills0
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models0
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding0
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model0
MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration0
MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception0
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles0
Multimodal Large Language Model for Visual Navigation0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
Towards Visual Text Grounding of Multimodal Large Language Model0
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security0
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.