Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 347 papers

Title	Date	Tasks	Status	Hype	Score
Valley2: Exploring Multimodal Models with Scalable Vision-Language Design	Jan 10, 2025	Image CaptioningLanguage Modeling	CodeCode Available	3	5
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3	5
Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM	Jun 18, 2024	Anomaly DetectionAnomaly Localization	CodeCode Available	2	5
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Nov 14, 2024	Earth ObservationInstruction Following	CodeCode Available	2	5
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area	Aug 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench	Oct 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding	Jan 14, 2025	image-classificationImage Classification	CodeCode Available	2	5
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2	5
Referring to Any Person	Mar 11, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	2	5
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos	Sep 29, 2024	AllImage Segmentation	CodeCode Available	2	5
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model	Mar 8, 2025	Image Quality AssessmentLanguage Modeling	CodeCode Available	2	5
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection	Nov 26, 2024	3D Object DetectionAutonomous Driving	CodeCode Available	2	5
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	Jun 23, 2023	BenchmarkingLanguage Modeling	CodeCode Available	2	5
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2	5
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want	Mar 29, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2	5
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast	Feb 13, 2024	Language ModellingLarge Language Model	CodeCode Available	2	5
LaVy: Vietnamese Multimodal Large Language Model	Apr 11, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Explore the Limits of Omni-modal Pretraining at Scale	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Sep 28, 2024	image-classificationImage Classification	CodeCode Available	2	5
LLMGA: Multimodal Large Language Model based Generation Assistant	Nov 27, 2023	Image GenerationLanguage Modeling	CodeCode Available	2	5
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Jan 11, 2025	Chart UnderstandingCode Generation	CodeCode Available	2	5
Introducing Visual Perception Token into Multimodal Large Language Model	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
Jailbreaking Attack against Multimodal Large Language Model	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis	Aug 14, 2024	Anomaly DetectionBoundary Detection	CodeCode Available	2	5

Show:10 25 50

← PrevPage 2 of 14Next →

No leaderboard results yet.