Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–347 of 347 papers

Title	Date	Tasks	Status	Hype
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal	Feb 17, 2024	Language ModelingLanguage Modelling	—Unverified	0
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast	Feb 13, 2024	Language ModellingLarge Language Model	CodeCode Available	2
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks	Feb 13, 2024	Language ModelingLanguage Modelling	—Unverified	0
Lumos : Empowering Multimodal LLMs with Scene Text Recognition	Feb 12, 2024	Language ModelingLanguage Modelling	—Unverified	0
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified	0
Jailbreaking Attack against Multimodal Large Language Model	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs	Jan 29, 2024	Language ModellingLarge Language Model	—Unverified	0
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion	Jan 24, 2024	Conditional Image GenerationDenoising	—Unverified	0
MLLMReID: Multimodal Large Language Model-based Person Re-identification	Jan 24, 2024	Language ModelingLanguage Modelling	—Unverified	0
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation	Jan 1, 2024	Image GenerationLanguage Modeling	—Unverified	0
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Jan 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework	Dec 31, 2023	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones	Dec 28, 2023	Computational EfficiencyImage Captioning	CodeCode Available	3
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation	Dec 24, 2023	Common Sense ReasoningLanguage Modeling	—Unverified	0
StarVector: Generating Scalable Vector Graphics Code from Images and Text	Dec 17, 2023	Code GenerationLanguage Modeling	CodeCode Available	5
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model	Dec 12, 2023	Contrastive LearningHallucination	CodeCode Available	1
Audio-Visual LLM for Video Understanding	Dec 11, 2023	AudioCapsLanguage Modeling	—Unverified	0
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model	Dec 5, 2023	Boundary DetectionLanguage Modeling	—Unverified	0
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation	Dec 4, 2023	Instruction FollowingLanguage Modeling	—Unverified	0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model	Nov 30, 2023	Language ModelingLanguage Modelling	CodeCode Available	0
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation	Nov 30, 2023	Image GenerationIn-Context Learning	—Unverified	0
LLMGA: Multimodal Large Language Model based Generation Assistant	Nov 27, 2023	Image GenerationLanguage Modeling	CodeCode Available	2
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified	0
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Nov 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model	Nov 10, 2023	Image CaptioningLanguage Modeling	—Unverified	0
Chain of Images for Intuitively Reasoning	Nov 9, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V	Oct 29, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images	Oct 22, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
Multimodal Large Language Model for Visual Navigation	Oct 12, 2023	Language ModelingLanguage Modelling	—Unverified	0
Ferret: Refer and Ground Anything Anywhere at Any Granularity	Oct 11, 2023	HallucinationLanguage Modeling	CodeCode Available	5
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	Oct 8, 2023	DecoderLanguage Modeling	CodeCode Available	1
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips	Oct 1, 2023	Language ModelingLanguage Modelling	—Unverified	0
Investigating the Catastrophic Forgetting in Multimodal Large Language Models	Sep 19, 2023	image-classificationImage Classification	—Unverified	0
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems	Aug 20, 2023	Emotion RecognitionLanguage Modelling	—Unverified	0
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis	Jul 31, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning	Jul 18, 2023	Instruction FollowingLanguage Modeling	—Unverified	0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	Jul 4, 2023	document understandingLanguage Modeling	CodeCode Available	0
Kosmos-2: Grounding Multimodal Large Language Models to the World	Jun 26, 2023	Image CaptioningIn-Context Learning	CodeCode Available	1
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2
A Survey on Multimodal Large Language Models	Jun 23, 2023	HallucinationIn-Context Learning	CodeCode Available	0
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks	Jun 7, 2023	Cross-Modal RetrievalLanguage Modelling	CodeCode Available	2
LMEye: An Interactive Perception Network for Large Language Models	May 5, 2023	Language ModellingLarge Language Model	CodeCode Available	1
Language Is Not All You Need: Aligning Perception with Language Models	Feb 27, 2023	AllImage Captioning	CodeCode Available	0

Show:10 25 50

← PrevPage 7 of 7Next →

No leaderboard results yet.