SOTAVerified

Multimodal Large Language Model

Papers

Showing 251300 of 347 papers

TitleStatusHype
MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception0
Automated radiotherapy treatment planning guided by GPT-4Vision0
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge0
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLMCode2
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language ModelCode1
Explore the Limits of Omni-modal Pretraining at ScaleCode2
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksCode5
Multimodal Table UnderstandingCode3
TRINS: Towards Multimodal Language Models that Can ReadCode0
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language ModelCode1
Ovis: Structural Embedding Alignment for Multimodal Large Language ModelCode5
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak0
Voice Jailbreak Attacks Against GPT-4oCode1
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language ModelCode0
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation0
A Survey of Multimodal Large Language Model from A Data-centric PerspectiveCode2
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLMCode0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability0
From Text to Pixel: Advancing Long-Context Understanding in MLLMsCode1
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese UnderstandingCode7
Layout Generation Agents with Large Language ModelsCode0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition0
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image EditingCode4
WorldGPT: Empowering LLM as Multimodal World ModelCode2
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstCode2
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source SuitesCode0
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation0
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language ModelsCode4
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models0
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve ClassificationCode3
LaVy: Vietnamese Multimodal Large Language ModelCode2
UMBRAE: Unified Multimodal Brain DecodingCode2
GUIDE: Graphical User Interface Data for Execution0
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security0
MoMA: Multimodal LLM Adapter for Fast Personalized Image GenerationCode3
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual TokensCode4
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization0
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You WantCode2
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition0
VL-Mamba: Exploring State Space Models for Multimodal Learning0
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization0
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
Multimodal Transformer for Comics Text-Cloze0
Multi-modal Instruction Tuned LLMs with Fine-grained Visual PerceptionCode1
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection0
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery0
ShapeLLM: Universal 3D Object Understanding for Embodied InteractionCode3
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic SurgeryCode0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.