Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 347 papers

Title	Date	Tasks	Status	Hype
MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception	Jun 22, 2024	Common Sense ReasoningLanguage Modelling	—Unverified	0
Automated radiotherapy treatment planning guided by GPT-4Vision	Jun 21, 2024	In-Context LearningLanguage Modelling	—Unverified	0
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge	Jun 18, 2024	Few-Shot Object DetectionLanguage Modeling	—Unverified	0
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM	Jun 18, 2024	Anomaly DetectionAnomaly Localization	CodeCode Available	2
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Explore the Limits of Omni-modal Pretraining at Scale	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jun 12, 2024	Image GenerationLanguage Modeling	CodeCode Available	5
Multimodal Table Understanding	Jun 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
TRINS: Towards Multimodal Language Models that Can Read	Jun 10, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model	Jun 3, 2024	Image OutpaintingLanguage Modeling	CodeCode Available	1
Ovis: Structural Embedding Alignment for Multimodal Large Language Model	May 31, 2024	Language ModelingMultimodal Large Language Model	CodeCode Available	5
Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak	May 30, 2024	Language ModelingLanguage Modelling	—Unverified	0
Voice Jailbreak Attacks Against GPT-4o	May 29, 2024	Language ModellingLarge Language Model	CodeCode Available	1
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model	May 28, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation	May 27, 2024	Instruction FollowingLanguage Modeling	—Unverified	0
A Survey of Multimodal Large Language Model from A Data-centric Perspective	May 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM	May 24, 2024	Language ModellingLarge Language Model	CodeCode Available	0
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability	May 23, 2024	cross-modal alignmentLanguage Modelling	—Unverified	0
From Text to Pixel: Advancing Long-Context Understanding in MLLMs	May 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	May 14, 2024	Image GenerationLanguage Modeling	CodeCode Available	7
Layout Generation Agents with Large Language Models	May 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified	0
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing	May 7, 2024	Image ManipulationLanguage Modeling	CodeCode Available	4
WorldGPT: Empowering LLM as Multimodal World Model	Apr 28, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Paint by Inpaint: Learning to Add Image Objects by Removing Them First	Apr 28, 2024	Image InpaintingLanguage Modeling	CodeCode Available	2
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	CodeCode Available	0
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Apr 23, 2024	Image GenerationLanguage Modeling	—Unverified	0
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models	Apr 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models	Apr 18, 2024	Fact CheckingLanguage Modeling	—Unverified	0
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification	Apr 16, 2024	Feature EngineeringLanguage Modeling	CodeCode Available	3
LaVy: Vietnamese Multimodal Large Language Model	Apr 11, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
UMBRAE: Unified Multimodal Brain Decoding	Apr 10, 2024	Brain DecodingLanguage Modeling	CodeCode Available	2
GUIDE: Graphical User Interface Data for Execution	Apr 9, 2024	Language ModellingLarge Language Model	—Unverified	0
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security	Apr 8, 2024	Language ModelingLanguage Modelling	—Unverified	0
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation	Apr 8, 2024	Image GenerationImage-to-Image Translation	CodeCode Available	3
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization	Apr 4, 2024	Grasp GenerationLanguage Modeling	—Unverified	0
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models	Apr 1, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want	Mar 29, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition	Mar 22, 2024	Language ModellingLarge Language Model	—Unverified	0
VL-Mamba: Exploring State Space Models for Multimodal Learning	Mar 20, 2024	Language ModelingLanguage Modelling	—Unverified	0
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization	Mar 13, 2024	Language ModelingLanguage Modelling	—Unverified	0
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
Multimodal Transformer for Comics Text-Cloze	Mar 6, 2024	Language ModelingLanguage Modelling	—Unverified	0
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception	Mar 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection	Mar 5, 2024	Concept AlignmentExplanation Generation	—Unverified	0
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery	Feb 28, 2024	Knowledge DistillationLanguage Modeling	—Unverified	0
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery	Feb 26, 2024	Continual LearningExemplar-Free	CodeCode Available	0

Show:10 25 50

← PrevPage 6 of 7Next →

No leaderboard results yet.