SOTAVerified

Multimodal Large Language Model

Papers

Showing 201225 of 347 papers

TitleStatusHype
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy0
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models0
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders0
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation0
Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring0
Distraction is All You Need for Multimodal Large Language Model Jailbreaking0
On Fairness of Unified Multimodal Large Language Model for Image Generation0
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving0
Leveraging Multimodal LLM for Inspirational User Interface SearchCode0
Learning Free Token Reduction for Multi-Modal Large Language Models0
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding0
EventVL: Understand Event Streams via Multimodal Large Language Model0
Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics0
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding0
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation0
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model0
ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization0
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.