SOTAVerified

Multimodal Large Language Model

Papers

Showing 301325 of 347 papers

TitleStatusHype
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal0
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education0
Jailbreaking Attack against Multimodal Large Language ModelCode2
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question AnsweringCode2
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs0
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion0
MLLMReID: Multimodal Large Language Model-based Person Re-identification0
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesCode1
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningCode2
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation0
LION: Empowering Multimodal Large Language Model with Dual-Level Visual KnowledgeCode2
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference FrameworkCode1
TinyGPT-V: Efficient Multimodal Large Language Model via Small BackbonesCode3
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation0
StarVector: Generating Scalable Vector Graphics Code from Images and TextCode5
Hallucination Augmented Contrastive Learning for Multimodal Large Language ModelCode1
Audio-Visual LLM for Video Understanding0
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model0
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model0
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.