| Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model | Nov 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning | Nov 18, 2024 | AttributeCompositional Zero-Shot Learning | CodeCode Available | 1 | 5 |
| Enhancing Time Series Forecasting via Multi-Level Text Alignment with LLMs | Apr 10, 2025 | Multimodal Large Language ModelTime Series | CodeCode Available | 1 | 5 |
| EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation | Aug 19, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 | 5 |
| Unifying Segment Anything in Microscopy with Multimodal Large Language Model | May 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation | Oct 17, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 | 5 |
| LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Nov 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| TextToucher: Fine-Grained Text-to-Touch Generation | Sep 9, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Chain of Images for Intuitively Reasoning | Nov 9, 2023 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 1 | 5 |
| MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Dec 20, 2024 | Cancer ClassificationChatbot | CodeCode Available | 1 | 5 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 | 5 |
| Voice Jailbreak Attacks Against GPT-4o | May 29, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences | Jan 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis | Jan 17, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 | 5 |
| Meaning Typed Prompting: A Technique for Efficient, Reliable Structured Output Generation | Oct 22, 2024 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 1 | 5 |
| DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution | Jun 24, 2024 | Image RestorationImage Super-Resolution | CodeCode Available | 1 | 5 |
| Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Aug 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions | Mar 20, 2025 | 2D Object DetectionDistributed Computing | CodeCode Available | 1 | 5 |
| LMEye: An Interactive Perception Network for Large Language Models | May 5, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Jul 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors | Jun 20, 2024 | 16kInstruction Following | CodeCode Available | 1 | 5 |
| UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model | Oct 8, 2023 | DecoderLanguage Modeling | CodeCode Available | 1 | 5 |
| LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models | Apr 1, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 | 5 |
| Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | Oct 29, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 | 5 |
| VideoQA in the Era of LLMs: An Empirical Study | Aug 8, 2024 | Multimodal Large Language ModelVideo Question Answering | CodeCode Available | 0 | 5 |
| Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | May 26, 2025 | image-classificationImage Classification | CodeCode Available | 0 | 5 |
| AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding | Aug 30, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |
| Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations | Oct 22, 2024 | Camouflaged Object SegmentationLarge Language Model | CodeCode Available | 0 | 5 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 | 5 |
| TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering | Nov 9, 2024 | Information RetrievalLanguage Modeling | CodeCode Available | 0 | 5 |
| TRINS: Towards Multimodal Language Models that Can Read | Jun 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation | Jun 16, 2025 | Data VisualizationLanguage Modeling | CodeCode Available | 0 | 5 |
| Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation | May 28, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 0 | 5 |
| Consistency-aware Fake Videos Detection on Short Video Platforms | Apr 30, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 0 | 5 |
| Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning | May 10, 2025 | Image AugmentationLarge Language Model | CodeCode Available | 0 | 5 |
| SCA: Improve Semantic Consistent in Unrestricted Adversarial Attacks via DDPM Inversion | Oct 3, 2024 | Adversarial AttackDenoising | CodeCode Available | 0 | 5 |
| Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model | May 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography | Jun 26, 2025 | DeciphermentLarge Language Model | CodeCode Available | 0 | 5 |
| Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Oct 15, 2024 | HallucinationLarge Language Model | CodeCode Available | 0 | 5 |
| Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering | Dec 19, 2024 | Contrastive LearningLanguage Modeling | CodeCode Available | 0 | 5 |
| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 | 5 |
| MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Dec 27, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 | 5 |
| Leveraging Multimodal LLM for Inspirational User Interface Search | Jan 29, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 | 5 |
| Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Mar 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | Sep 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning | Nov 17, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 0 | 5 |