Multimodal Large Language Model

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 347 papers

Title	Date	Tasks	Status	Hype
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Dec 9, 2024	Image GenerationLanguage Modeling	—Unverified	0
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Dec 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Dec 6, 2024	document understandingHallucination	CodeCode Available	0
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Dec 5, 2024	Language ModelingLanguage Modelling	—Unverified	0
Liquid: Language Models are Scalable Multi-modal Generators	Dec 5, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM	Dec 5, 2024	Image ManipulationLanguage Modeling	—Unverified	0
ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People	Dec 4, 2024	Large Language ModelMultimodal Large Language Model	—Unverified	0
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Dec 4, 2024	Image GenerationLarge Language Model	—Unverified	0
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Dec 4, 2024	Multimodal Large Language ModelVideo Understanding	CodeCode Available	1
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image	Dec 3, 2024	DiagnosticLanguage Modeling	—Unverified	0
Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey	Dec 3, 2024	Change DetectionDescriptive	CodeCode Available	3
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models	Dec 2, 2024	Language ModelingLanguage Modelling	—Unverified	0
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model	Dec 2, 2024	Language ModelingLanguage Modelling	—Unverified	0
Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model	Nov 29, 2024	Autonomous VehiclesLanguage Modeling	—Unverified	0
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection	Nov 26, 2024	3D Object DetectionAutonomous Driving	CodeCode Available	2
Multimodal large language model for wheat breeding: a new exploration of smart breeding	Nov 20, 2024	Language ModelingLanguage Modelling	—Unverified	0
StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model	Nov 19, 2024	Decision MakingLanguage Modeling	—Unverified	0
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model	Nov 19, 2024	Language ModelingLanguage Modelling	—Unverified	0
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model	Nov 19, 2024	Information RetrievalLanguage Modeling	—Unverified	0
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning	Nov 18, 2024	AttributeCompositional Zero-Shot Learning	CodeCode Available	1
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available	0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning	Nov 17, 2024	Image CaptioningLanguage Modeling	CodeCode Available	0
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model	Nov 16, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization	Nov 15, 2024	HallucinationHallucination Evaluation	—Unverified	0
MagicQuill: An Intelligent Interactive Image Editing System	Nov 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	7
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Nov 14, 2024	Earth ObservationInstruction Following	CodeCode Available	2
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models	Nov 11, 2024	2D Pose EstimationCategory-Agnostic Pose Estimation	—Unverified	0
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	Nov 11, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	2
TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering	Nov 9, 2024	Information RetrievalLanguage Modeling	CodeCode Available	0
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Nov 4, 2024	Language ModelingLanguage Modelling	—Unverified	0
Can Multimodal Large Language Model Think Analogically?	Nov 2, 2024	Language ModelingLanguage Modelling	—Unverified	0
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach	Oct 31, 2024	Language ModelingLanguage Modelling	—Unverified	0
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench	Oct 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Oct 24, 2024	DiversityLanguage Modeling	—Unverified	0
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Oct 24, 2024	image-classificationImage Classification	—Unverified	0
Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation	Oct 22, 2024	Large Language ModelMultimodal Large Language Model	CodeCode Available	1
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations	Oct 22, 2024	Camouflaged Object SegmentationLarge Language Model	CodeCode Available	0
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Oct 19, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation	Oct 19, 2024	AI AgentBenchmarking	CodeCode Available	2
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation	Oct 17, 2024	Decision MakingLanguage Modeling	CodeCode Available	1
MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description	Oct 15, 2024	Language ModelingLanguage Modelling	—Unverified	0
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Oct 15, 2024	HallucinationLarge Language Model	CodeCode Available	0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified	0
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation	Oct 11, 2024	DiagnosticLanguage Modeling	—Unverified	0
Baichuan-Omni Technical Report	Oct 11, 2024	Language ModelingLanguage Modelling	CodeCode Available	3
Hespi: A pipeline for automatically detecting information from hebarium specimen sheets	Oct 11, 2024	Handwritten Text RecognitionHTR	CodeCode Available	1
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction	Oct 7, 2024	Language ModelingLanguage Modelling	—Unverified	0
SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion	Oct 3, 2024	Adversarial AttackDenoising	CodeCode Available	0
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects	Oct 2, 2024	Language ModelingLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 4 of 7Next →

No leaderboard results yet.