| Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning | Dec 14, 2023 | cross-modal alignmentDecoder | —Unverified | 0 | 0 |
| Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration | Jun 12, 2025 | cross-modal alignmentImage to text | —Unverified | 0 | 0 |
| Improving speech translation by fusing speech and text | May 23, 2023 | cross-modal alignmentMachine Translation | —Unverified | 0 | 0 |
| InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals | Apr 13, 2025 | cross-modal alignmentSelf-Supervised Learning | —Unverified | 0 | 0 |
| Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model | Jan 21, 2025 | cross-modal alignmentGraph Embedding | —Unverified | 0 | 0 |
| Intriguing Properties of Large Language and Vision Models | Oct 7, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 | 0 |
| JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation | Oct 1, 2022 | cross-modal alignmentDisease Prediction | —Unverified | 0 | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Jan 16, 2022 | cross-modal alignmentKnowledge Distillation | —Unverified | 0 | 0 |
| LangBridge: Interpreting Image as a Combination of Language Embeddings | Mar 25, 2025 | cross-modal alignment | —Unverified | 0 | 0 |
| Linguistic Query-Guided Mask Generation for Referring Image Segmentation | Jan 16, 2023 | Contrastive Learningcross-modal alignment | —Unverified | 0 | 0 |