| GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency | Dec 12, 2024 | cross-modal alignmentTransfer Learning | CodeCode Available | 1 |
| GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning | Dec 10, 2024 | cross-modal alignmentVideo Understanding | —Unverified | 0 |
| Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations | Dec 9, 2024 | cross-modal alignmentEEG | —Unverified | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model | Dec 2, 2024 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 1 |
| AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment | Dec 1, 2024 | cross-modal alignmentMamba | —Unverified | 0 |
| SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality | Nov 27, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion | Nov 27, 2024 | cross-modal alignmentPedestrian Detection | —Unverified | 0 |
| Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge | Nov 21, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis | Nov 1, 2024 | cross-modal alignmentPhenotype classification | —Unverified | 0 |
| Towards Cross-Modal Text-Molecule Retrieval with Better Modality Alignment | Oct 31, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval | Oct 26, 2024 | cross-modal alignmentPerson Retrieval | —Unverified | 0 |
| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 |
| Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms | Oct 17, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding | Oct 17, 2024 | cross-modal alignmentSentence | —Unverified | 0 |
| LESS: Label-Efficient and Single-Stage Referring 3D Segmentation | Oct 17, 2024 | cross-modal alignmentInstance Segmentation | CodeCode Available | 1 |
| OMCAT: Omni Context Aware Transformer | Oct 15, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective | Oct 14, 2024 | cross-modal alignmentImage Generation | CodeCode Available | 0 |
| Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Oct 9, 2024 | cross-modal alignmentVisual Question Answering | CodeCode Available | 2 |
| EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment | Oct 8, 2024 | cross-modal alignmentHallucination | —Unverified | 0 |
| Intriguing Properties of Large Language and Vision Models | Oct 7, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation | Oct 5, 2024 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners | Oct 3, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Melody-Guided Music Generation | Sep 30, 2024 | cross-modal alignmentMusic Generation | CodeCode Available | 2 |
| Fully Aligned Network for Referring Image Segmentation | Sep 29, 2024 | cross-modal alignmentDecoder | —Unverified | 0 |