Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 326–347 of 347 papers

Title	Date	Tasks	Status
Lumos : Empowering Multimodal LLMs with Scene Text Recognition	Feb 12, 2024	Language ModelingLanguage Modelling	—Unverified
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs	Jan 29, 2024	Language ModellingLarge Language Model	—Unverified
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion	Jan 24, 2024	Conditional Image GenerationDenoising	—Unverified
MLLMReID: Multimodal Large Language Model-based Person Re-identification	Jan 24, 2024	Language ModelingLanguage Modelling	—Unverified
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation	Jan 1, 2024	Image GenerationLanguage Modeling	—Unverified
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation	Dec 24, 2023	Common Sense ReasoningLanguage Modeling	—Unverified
Audio-Visual LLM for Video Understanding	Dec 11, 2023	AudioCapsLanguage Modeling	—Unverified
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model	Dec 5, 2023	Boundary DetectionLanguage Modeling	—Unverified
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation	Dec 4, 2023	Instruction FollowingLanguage Modeling	—Unverified
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation	Nov 30, 2023	Image GenerationIn-Context Learning	—Unverified
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model	Nov 30, 2023	Language ModelingLanguage Modelling	—Unverified
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model	Nov 10, 2023	Image CaptioningLanguage Modeling	—Unverified
Multimodal Large Language Model for Visual Navigation	Oct 12, 2023	Language ModelingLanguage Modelling	—Unverified
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips	Oct 1, 2023	Language ModelingLanguage Modelling	—Unverified
Investigating the Catastrophic Forgetting in Multimodal Large Language Models	Sep 19, 2023	image-classificationImage Classification	—Unverified
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems	Aug 20, 2023	Emotion RecognitionLanguage Modelling	—Unverified
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning	Jul 18, 2023	Instruction FollowingLanguage Modeling	—Unverified
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	Jul 4, 2023	document understandingLanguage Modeling	—Unverified
A Survey on Multimodal Large Language Models	Jun 23, 2023	HallucinationIn-Context Learning	—Unverified
Language Is Not All You Need: Aligning Perception with Language Models	Feb 27, 2023	AllImage Captioning	—Unverified

Show:10 25 50

← PrevPage 14 of 14Next →

No leaderboard results yet.