| Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models | May 8, 2025 | Active Learningcross-modal alignment | CodeCode Available | 0 |
| PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing | May 6, 2025 | cross-modal alignment | —Unverified | 0 |
| CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | May 2, 2025 | audio-visual learningcross-modal alignment | CodeCode Available | 1 |
| MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Apr 29, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 |
| A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Apr 22, 2025 | cross-modal alignmentScript Generation | —Unverified | 0 |
| Cross-attention for State-based model RWKV-7 | Apr 19, 2025 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| TMCIR: Token Merge Benefits Composed Image Retrieval | Apr 15, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals | Apr 13, 2025 | cross-modal alignmentSelf-Supervised Learning | —Unverified | 0 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 |
| VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | Apr 11, 2025 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity | Apr 8, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification | Apr 8, 2025 | cross-modal alignmentImage Classification | CodeCode Available | 0 |
| Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Apr 3, 2025 | 3D Object Detectioncross-modal alignment | CodeCode Available | 1 |
| FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs | Apr 2, 2025 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval | Apr 2, 2025 | cross-modal alignmentRetrieval | —Unverified | 0 |
| COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking | Apr 2, 2025 | cross-modal alignmentObject | CodeCode Available | 0 |
| DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow | Apr 2, 2025 | Autonomous DrivingCamera Calibration | —Unverified | 0 |
| SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering | Apr 1, 2025 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 |
| NeuroLIP: Interpretable and Fair Cross-Modal Alignment of fMRI and Phenotypic Text | Mar 27, 2025 | AttributeContrastive Learning | —Unverified | 0 |
| GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations | Mar 26, 2025 | cross-modal alignmentEmotion Classification | —Unverified | 0 |
| AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction | Mar 26, 2025 | Computed Tomography (CT)cross-modal alignment | —Unverified | 0 |
| LangBridge: Interpreting Image as a Combination of Language Embeddings | Mar 25, 2025 | cross-modal alignment | —Unverified | 0 |
| LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation | Mar 25, 2025 | cross-modal alignmentOpen Vocabulary Semantic Segmentation | CodeCode Available | 1 |