SOTAVerified

Multimodal Large Language Model

Papers

Showing 301325 of 347 papers

TitleStatusHype
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization0
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning0
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation0
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
MLLMReID: Multimodal Large Language Model-based Person Re-identification0
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal0
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation0
MobileFlow: A Multimodal LLM For Mobile GUI Agent0
MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description0
MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills0
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models0
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding0
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model0
MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration0
MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception0
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles0
Multimodal Large Language Model for Visual Navigation0
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation0
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model0
Multimodal Transformer for Comics Text-Cloze0
ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People0
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects0
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.