| DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | May 31, 2024 | cross-modal alignmentVisual Localization | CodeCode Available | 2 |
| Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval | May 29, 2024 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment | May 28, 2024 | cross-modal alignment | CodeCode Available | 2 |
| OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All | May 25, 2024 | Allcross-modal alignment | —Unverified | 0 |
| Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation | May 23, 2024 | cross-modal alignmentDecoder | CodeCode Available | 1 |
| AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | May 23, 2024 | cross-modal alignmentLanguage Modelling | —Unverified | 0 |
| Context-Enhanced Video Moment Retrieval with Large Language Models | May 21, 2024 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation | May 15, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding | Apr 20, 2024 | cross-modal alignmentVisual Grounding | CodeCode Available | 2 |