Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 347 papers

Title	Date	Tasks	Status	Hype
TextToucher: Fine-Grained Text-to-Touch Generation	Sep 9, 2024	Language ModellingLarge Language Model	CodeCode Available	1
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding	Aug 21, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation	Aug 19, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Aug 19, 2024	DescriptiveFace Swapping	CodeCode Available	1
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Aug 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model	Jul 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
A Refer-and-Ground Multimodal Large Language Model for Biomedicine	Jun 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution	Jun 24, 2024	Image RestorationImage Super-Resolution	CodeCode Available	1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model	Jun 17, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model	Jun 3, 2024	Image OutpaintingLanguage Modeling	CodeCode Available	1
Voice Jailbreak Attacks Against GPT-4o	May 29, 2024	Language ModellingLarge Language Model	CodeCode Available	1
From Text to Pixel: Advancing Long-Context Understanding in MLLMs	May 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models	Apr 1, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception	Mar 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences	Jan 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework	Dec 31, 2023	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model	Dec 12, 2023	Contrastive LearningHallucination	CodeCode Available	1
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	Nov 20, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Chain of Images for Intuitively Reasoning	Nov 9, 2023	Common Sense ReasoningLanguage Modelling	CodeCode Available	1
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V	Oct 29, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images	Oct 22, 2023	DiagnosticLanguage Modeling	CodeCode Available	1
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	Oct 8, 2023	DecoderLanguage Modeling	CodeCode Available	1
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis	Jul 31, 2023	Language ModelingLanguage Modelling	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 14Next →

No leaderboard results yet.