SOTAVerified

Multimodal Large Language Model

Papers

Showing 301347 of 347 papers

TitleStatusHype
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal0
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
Jailbreaking Attack against Multimodal Large Language ModelCode2
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question AnsweringCode2
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs0
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion0
MLLMReID: Multimodal Large Language Model-based Person Re-identification0
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesCode1
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningCode2
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation0
LION: Empowering Multimodal Large Language Model with Dual-Level Visual KnowledgeCode2
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference FrameworkCode1
TinyGPT-V: Efficient Multimodal Large Language Model via Small BackbonesCode3
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation0
StarVector: Generating Scalable Vector Graphics Code from Images and TextCode5
Hallucination Augmented Contrastive Learning for Multimodal Large Language ModelCode1
Audio-Visual LLM for Video Understanding0
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model0
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language ModelCode0
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation0
LLMGA: Multimodal Large Language Model based Generation AssistantCode2
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
LION : Empowering Multimodal Large Language Model with Dual-Level Visual KnowledgeCode1
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model0
Chain of Images for Intuitively ReasoningCode1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4VCode1
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray imagesCode1
Multimodal Large Language Model for Visual Navigation0
Ferret: Refer and Ground Anything Anywhere at Any GranularityCode5
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language ModelCode1
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips0
Investigating the Catastrophic Forgetting in Multimodal Large Language Models0
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems0
FinVis-GPT: A Multimodal Large Language Model for Financial Chart AnalysisCode1
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document UnderstandingCode0
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsCode2
A Survey on Multimodal Large Language ModelsCode0
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and BenchmarksCode2
LMEye: An Interactive Perception Network for Large Language ModelsCode1
Language Is Not All You Need: Aligning Perception with Language ModelsCode0
Show:102550
← PrevPage 7 of 7Next →

No leaderboard results yet.