| Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos | Sep 18, 2020 | cross-modal alignmentreinforcement-learning | —Unverified | 0 |
| Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval | May 22, 2025 | cross-modal alignmentImage-text Retrieval | —Unverified | 0 |
| Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models | Jun 15, 2023 | cross-modal alignmentDomain Generalization | —Unverified | 0 |
| Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion | Nov 27, 2024 | cross-modal alignmentPedestrian Detection | —Unverified | 0 |
| Scene-Intuitive Agent for Remote Embodied Visual Grounding | Mar 24, 2021 | cross-modal alignmentNavigate | —Unverified | 0 |
| SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity | Apr 8, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity | Aug 7, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection | Jan 6, 2024 | Anomaly Detectioncross-modal alignment | —Unverified | 0 |
| Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training | Mar 1, 2024 | cross-modal alignmentRepresentation Learning | —Unverified | 0 |
| Semantic-Space-Intervened Diffusive Alignment for Visual Classification | May 9, 2025 | Classificationcross-modal alignment | —Unverified | 0 |