| AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment | Dec 1, 2024 | cross-modal alignmentMamba | —Unverified | 0 |
| SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality | Nov 27, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion | Nov 27, 2024 | cross-modal alignmentPedestrian Detection | —Unverified | 0 |
| Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge | Nov 21, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis | Nov 1, 2024 | cross-modal alignmentPhenotype classification | —Unverified | 0 |
| Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment | Oct 31, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval | Oct 26, 2024 | cross-modal alignmentPerson Retrieval | —Unverified | 0 |
| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 |
| Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms | Oct 17, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding | Oct 17, 2024 | cross-modal alignmentSentence | —Unverified | 0 |