Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 347 papers

Title	Date	Tasks	Status	Hype
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	Sep 30, 2024	Anomaly DetectionLanguage Modeling	—Unverified	0
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation	Sep 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos	Sep 29, 2024	AllImage Segmentation	CodeCode Available	2
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Sep 28, 2024	image-classificationImage Classification	CodeCode Available	2
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified	0
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches	Sep 26, 2024	Language ModelingLanguage Modelling	—Unverified	0
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation	Sep 24, 2024	Contrastive LearningLanguage Modeling	—Unverified	0
Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference	Sep 18, 2024	Image CaptioningLarge Language Model	—Unverified	0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding	Sep 10, 2024	BenchmarkingLanguage Modeling	CodeCode Available	0
Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles	Sep 10, 2024	Autonomous VehiclesLanguage Modeling	—Unverified	0
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning	Sep 9, 2024	Federated LearningImage Captioning	—Unverified	0
TextToucher: Fine-Grained Text-to-Touch Generation	Sep 9, 2024	Language ModellingLarge Language Model	CodeCode Available	1
A Medical Multimodal Large Language Model for Pediatric Pneumonia	Sep 4, 2024	DiagnosticLanguage Modeling	—Unverified	0
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing	Sep 2, 2024	Image GenerationLanguage Modelling	—Unverified	0
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction	Sep 2, 2024	Language ModelingLanguage Modelling	—Unverified	0
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model	Sep 1, 2024	Language ModelingLanguage Modelling	—Unverified	0
OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography	Aug 30, 2024	Computed Tomography (CT)Diagnostic	—Unverified	0
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models	Aug 30, 2024	Image CaptioningLanguage Modeling	CodeCode Available	1
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	Aug 30, 2024	Language ModellingLarge Language Model	CodeCode Available	0
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model	Aug 22, 2024	Language ModelingLanguage Modelling	—Unverified	0
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese	Aug 22, 2024	Language ModelingLanguage Modelling	—Unverified	0
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion	Aug 21, 2024	Language ModellingLarge Language Model	—Unverified	0
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding	Aug 21, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Aug 21, 2024	Computational EfficiencyLanguage Modeling	—Unverified	0
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model	Aug 21, 2024	Emotion RecognitionLanguage Modeling	—Unverified	0
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation	Aug 19, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	Aug 19, 2024	DescriptiveFace Swapping	CodeCode Available	1
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis	Aug 18, 2024	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	—Unverified	0
MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis	Aug 14, 2024	Anomaly DetectionBoundary Detection	CodeCode Available	2
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area	Aug 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
ChatGPT Meets Iris Biometrics	Aug 9, 2024	Face RecognitionIris Recognition	—Unverified	0
VITA: Towards Open-Source Interactive Omni Multimodal LLM	Aug 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	7
VideoQA in the Era of LLMs: An Empirical Study	Aug 8, 2024	Multimodal Large Language ModelVideo Question Answering	CodeCode Available	0
Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions	Aug 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks	Jul 29, 2024	Deep LearningDomain Generalization	—Unverified	0
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models	Jul 27, 2024	Language ModelingLanguage Modelling	—Unverified	0
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models	Jul 26, 2024	DisentanglementLanguage Modeling	—Unverified	0
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic	Jul 25, 2024	Image to textLanguage Modeling	—Unverified	0
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model	Jul 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Jul 19, 2024	AttributeLanguage Modeling	CodeCode Available	2
Visual Text Generation in the Wild	Jul 19, 2024	Language ModellingLarge Language Model	CodeCode Available	0
UrbanWorld: An Urban World Model for 3D City Generation	Jul 16, 2024	Decision MakingLanguage Modelling	CodeCode Available	2
A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model	Jul 12, 2024	Language ModelingLanguage Modelling	—Unverified	0
SEED-Story: Multimodal Long Story Generation with Large Language Model	Jul 11, 2024	Image GenerationLanguage Modeling	CodeCode Available	4
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing	Jul 8, 2024	Image GenerationLanguage Modeling	—Unverified	0
MobileFlow: A Multimodal LLM For Mobile GUI Agent	Jul 5, 2024	Action AnalysisLanguage Modelling	—Unverified	0
MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration	Jul 4, 2024	DenoisingImage Restoration	—Unverified	0
A Refer-and-Ground Multimodal Large Language Model for Biomedicine	Jun 26, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models	Jun 24, 2024	Language ModelingLanguage Modelling	—Unverified	0
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution	Jun 24, 2024	Image RestorationImage Super-Resolution	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 7Next →

No leaderboard results yet.