Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–347 of 347 papers

Title	Date	Tasks	Status
Automated radiotherapy treatment planning guided by GPT-4Vision	Jun 21, 2024	In-Context LearningLanguage Modelling	—Unverified
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge	Jun 18, 2024	Few-Shot Object DetectionLanguage Modeling	—Unverified
TRINS: Towards Multimodal Language Models that Can Read	Jun 10, 2024	Language ModelingLanguage Modelling	CodeCode Available
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak	May 30, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model	May 28, 2024	Language ModelingLanguage Modelling	CodeCode Available
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation	May 27, 2024	Instruction FollowingLanguage Modeling	—Unverified
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM	May 24, 2024	Language ModellingLarge Language Model	CodeCode Available
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability	May 23, 2024	cross-modal alignmentLanguage Modelling	—Unverified
Layout Generation Agents with Large Language Models	May 13, 2024	Language ModelingLanguage Modelling	CodeCode Available
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	CodeCode Available
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Apr 23, 2024	Image GenerationLanguage Modeling	—Unverified
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models	Apr 18, 2024	Fact CheckingLanguage Modeling	—Unverified
GUIDE: Graphical User Interface Data for Execution	Apr 9, 2024	Language ModellingLarge Language Model	—Unverified
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security	Apr 8, 2024	Language ModelingLanguage Modelling	—Unverified
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization	Apr 4, 2024	Grasp GenerationLanguage Modeling	—Unverified
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition	Mar 22, 2024	Language ModellingLarge Language Model	—Unverified
VL-Mamba: Exploring State Space Models for Multimodal Learning	Mar 20, 2024	Language ModelingLanguage Modelling	—Unverified
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization	Mar 13, 2024	Language ModelingLanguage Modelling	—Unverified
Multimodal Transformer for Comics Text-Cloze	Mar 6, 2024	Language ModelingLanguage Modelling	—Unverified
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection	Mar 5, 2024	Concept AlignmentExplanation Generation	—Unverified
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery	Feb 28, 2024	Knowledge DistillationLanguage Modeling	—Unverified
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery	Feb 26, 2024	Continual LearningExemplar-Free	CodeCode Available
MMMModal -- Multi-Images Multi-Audio Multi-turn Multi-Modal	Feb 17, 2024	Language ModelingLanguage Modelling	—Unverified
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks	Feb 13, 2024	Language ModelingLanguage Modelling	—Unverified
Lumos : Empowering Multimodal LLMs with Scene Text Recognition	Feb 12, 2024	Language ModelingLanguage Modelling	—Unverified
LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education	Feb 9, 2024	BenchmarkingChatbot	—Unverified
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs	Jan 29, 2024	Language ModellingLarge Language Model	—Unverified
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion	Jan 24, 2024	Conditional Image GenerationDenoising	—Unverified
MLLMReID: Multimodal Large Language Model-based Person Re-identification	Jan 24, 2024	Language ModelingLanguage Modelling	—Unverified
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation	Jan 1, 2024	Image GenerationLanguage Modeling	—Unverified
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation	Dec 24, 2023	Common Sense ReasoningLanguage Modeling	—Unverified
Audio-Visual LLM for Video Understanding	Dec 11, 2023	AudioCapsLanguage Modeling	—Unverified
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model	Dec 5, 2023	Boundary DetectionLanguage Modeling	—Unverified
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation	Dec 4, 2023	Instruction FollowingLanguage Modeling	—Unverified
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation	Nov 30, 2023	Image GenerationIn-Context Learning	—Unverified
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model	Nov 30, 2023	Language ModelingLanguage Modelling	CodeCode Available
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model	Nov 10, 2023	Image CaptioningLanguage Modeling	—Unverified
Multimodal Large Language Model for Visual Navigation	Oct 12, 2023	Language ModelingLanguage Modelling	—Unverified
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips	Oct 1, 2023	Language ModelingLanguage Modelling	—Unverified
Investigating the Catastrophic Forgetting in Multimodal Large Language Models	Sep 19, 2023	image-classificationImage Classification	—Unverified
Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems	Aug 20, 2023	Emotion RecognitionLanguage Modelling	—Unverified
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning	Jul 18, 2023	Instruction FollowingLanguage Modeling	—Unverified
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	Jul 4, 2023	document understandingLanguage Modeling	CodeCode Available
A Survey on Multimodal Large Language Models	Jun 23, 2023	HallucinationIn-Context Learning	CodeCode Available
Language Is Not All You Need: Aligning Perception with Language Models	Feb 27, 2023	AllImage Captioning	CodeCode Available

Show:10 25 50

← PrevPage 7 of 7Next →

No leaderboard results yet.