| ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Dec 9, 2024 | Image GenerationLanguage Modeling | —Unverified | 0 |
| LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | CodeCode Available | 0 |
| EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Dec 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Liquid: Language Models are Scalable Multi-modal Generators | Dec 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM | Dec 5, 2024 | Image ManipulationLanguage Modeling | —Unverified | 0 |
| ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People | Dec 4, 2024 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Dec 4, 2024 | Image GenerationLarge Language Model | —Unverified | 0 |
| Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | Dec 4, 2024 | Multimodal Large Language ModelVideo Understanding | CodeCode Available | 1 |
| WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image | Dec 3, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 |
| Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Dec 3, 2024 | Change DetectionDescriptive | CodeCode Available | 3 |
| MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model | Nov 29, 2024 | Autonomous VehiclesLanguage Modeling | —Unverified | 0 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Multimodal large language model for wheat breeding: a new exploration of smart breeding | Nov 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| StreetviewLLM: Extracting Geographic Information Using a Chain-of-Thought Multimodal Large Language Model | Nov 19, 2024 | Decision MakingLanguage Modeling | —Unverified | 0 |
| Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model | Nov 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model | Nov 19, 2024 | Information RetrievalLanguage Modeling | —Unverified | 0 |
| Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning | Nov 18, 2024 | AttributeCompositional Zero-Shot Learning | CodeCode Available | 1 |
| Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts | Nov 18, 2024 | BenchmarkingMultimodal Large Language Model | CodeCode Available | 0 |
| Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Nov 17, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 0 |
| Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model | Nov 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Nov 15, 2024 | HallucinationHallucination Evaluation | —Unverified | 0 |
| MagicQuill: An Intelligent Interactive Image Editing System | Nov 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |