SOTAVerified

Multimodal Large Language Model

Papers

Showing 201250 of 347 papers

TitleStatusHype
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language ModelsCode0
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders0
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation0
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring0
Distraction is All You Need for Multimodal Large Language Model Jailbreaking0
On Fairness of Unified Multimodal Large Language Model for Image Generation0
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
Learning Free Token Reduction for Multi-Modal Large Language Models0
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding0
EventVL: Understand Event Streams via Multimodal Large Language Model0
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics0
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding0
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation0
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model0
ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization0
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults0
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question AnsweringCode0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation0
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM0
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation0
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance0
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time ScalingCode0
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios0
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM0
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation0
ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People0
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image0
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models0
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model0
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model0
Multimodal large language model for wheat breeding: a new exploration of smart breeding0
StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model0
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model0
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-TuningCode0
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.