SOTAVerified

Multimodal Large Language Model

Papers

Showing 126150 of 347 papers

TitleStatusHype
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code GenerationCode2
Valley2: Exploring Multimodal Models with Scalable Vision-Language DesignCode3
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction0
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding0
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models0
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model0
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question AnsweringCode1
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation0
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform0
ST^3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming0
MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic ScenariosCode0
A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization0
SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults0
MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and DetectionCode1
J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question AnsweringCode0
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation0
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges0
IDEA-Bench: How Far are Generative Models from Professional Designing?Code1
MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM0
Towards a Multimodal Large Language Model with Pixel-Level Insight for BiomedicineCode2
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework0
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation0
Show:102550
← PrevPage 6 of 14Next →

No leaderboard results yet.