| COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking | Apr 2, 2025 | cross-modal alignmentObject | —Unverified | 0 |
| FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs | Apr 2, 2025 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering | Apr 1, 2025 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| NeuroLIP: Interpretable and Fair Cross-Modal Alignment of fMRI and Phenotypic Text | Mar 27, 2025 | AttributeContrastive Learning | —Unverified | 0 |
| GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations | Mar 26, 2025 | cross-modal alignmentEmotion Classification | —Unverified | 0 |
| AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction | Mar 26, 2025 | Computed Tomography (CT)cross-modal alignment | —Unverified | 0 |
| LangBridge: Interpreting Image as a Combination of Language Embeddings | Mar 25, 2025 | cross-modal alignment | —Unverified | 0 |
| Shushing! Let's Imagine an Authentic Speech from the Silent Video | Mar 19, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Language-based Image Colorization: A Benchmark and Beyond | Mar 19, 2025 | BenchmarkingColorization | CodeCode Available | 0 |