Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 347 papers

Title	Date	Tasks	Status	Hype
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones	Dec 28, 2023	Computational EfficiencyImage Captioning	CodeCode Available	3
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents	May 21, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	2
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer	Apr 14, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
Referring to Any Person	Mar 11, 2025	Large Language ModelMultimodal Large Language Model	CodeCode Available	2
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model	Mar 8, 2025	Image Quality AssessmentLanguage Modeling	CodeCode Available	2
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model	Mar 6, 2025	General KnowledgeImage Captioning	CodeCode Available	2
Introducing Visual Perception Token into Multimodal Large Language Model	Feb 24, 2025	Language ModelingLanguage Modelling	CodeCode Available	2
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding	Jan 14, 2025	image-classificationImage Classification	CodeCode Available	2
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Jan 14, 2025	Feature CompressionLanguage Modeling	CodeCode Available	2
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Jan 11, 2025	Chart UnderstandingCode Generation	CodeCode Available	2
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Dec 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection	Nov 26, 2024	3D Object DetectionAutonomous Driving	CodeCode Available	2
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Nov 14, 2024	Earth ObservationInstruction Following	CodeCode Available	2
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	Nov 11, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	2
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench	Oct 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation	Oct 19, 2024	AI AgentBenchmarking	CodeCode Available	2
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos	Sep 29, 2024	AllImage Segmentation	CodeCode Available	2
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	Sep 28, 2024	image-classificationImage Classification	CodeCode Available	2
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area	Aug 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis	Aug 14, 2024	Anomaly DetectionBoundary Detection	CodeCode Available	2
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation	Jul 19, 2024	AttributeLanguage Modeling	CodeCode Available	2

Show:10 25 50

← PrevPage 2 of 14Next →

No leaderboard results yet.