| Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding | Oct 21, 2022 | cross-modal alignmentSentence | —Unverified | 0 |
| Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation | Oct 18, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval | Oct 17, 2022 | cross-modal alignmentObject | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation | Oct 1, 2022 | cross-modal alignmentDisease Prediction | —Unverified | 0 |
| TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval | Sep 28, 2022 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection | Sep 28, 2022 | 2D Object Detectioncross-modal alignment | —Unverified | 0 |
| Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval | Sep 23, 2022 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| OmniVL:One Foundation Model for Image-Language and Video-Language Tasks | Sep 15, 2022 | Action ClassificationAction Recognition | —Unverified | 0 |
| See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity | Aug 7, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Masked Vision and Language Modeling for Multi-modal Representation Learning | Aug 3, 2022 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Cross-Modal Alignment Learning of Vision-Language Conceptual Systems | Jul 31, 2022 | cross-modal alignmentRepresentation Learning | —Unverified | 0 |
| A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues | Jul 24, 2022 | cross-modal alignmentTrajectory Planning | CodeCode Available | 0 |
| VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix | Jun 17, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Reinforced Cross-modal Alignment for Radiology Report Generation | May 1, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| mSLAM: Massively multilingual joint pre-training for speech and text | Feb 3, 2022 | cross-modal alignmentintent-classification | —Unverified | 0 |
| ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding | Jan 16, 2022 | cross-modal alignmentDocument Classification | CodeCode Available | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Jan 16, 2022 | cross-modal alignmentKnowledge Distillation | —Unverified | 0 |
| Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision | Sep 29, 2021 | cross-modal alignmentobject-detection | —Unverified | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Sep 22, 2021 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 0 |
| Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images | Aug 9, 2021 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval | Aug 5, 2021 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information | Apr 19, 2021 | cross-modal alignmentNavigate | CodeCode Available | 0 |
| Continual learning in cross-modal retrieval | Apr 14, 2021 | Continual Learningcross-modal alignment | —Unverified | 0 |
| Scene-Intuitive Agent for Remote Embodied Visual Grounding | Mar 24, 2021 | cross-modal alignmentNavigate | —Unverified | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | Oct 27, 2020 | cross-modal alignmentRepresentation Learning | CodeCode Available | 0 |
| ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding | Oct 23, 2020 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos | Sep 18, 2020 | cross-modal alignmentreinforcement-learning | —Unverified | 0 |
| Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation | Sep 17, 2020 | cross-modal alignmentImage to text | —Unverified | 0 |
| Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation | Aug 4, 2020 | 2D Pose Estimation3D Human Pose Estimation | CodeCode Available | 0 |
| Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm | Jun 3, 2020 | cross-modal alignmentGeneral Classification | —Unverified | 0 |
| Cross-Modal Cross-Domain Moment Alignment Network for Person Search | Jun 1, 2020 | cross-modal alignmentPerson Search | —Unverified | 0 |
| Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models | May 15, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space | May 11, 2020 | cross-modal alignmentDecoder | —Unverified | 0 |
| MCQA: Multimodal Co-attention Based Network for Question Answering | Apr 25, 2020 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| Curriculum Audiovisual Learning | Jan 26, 2020 | Clusteringcross-modal alignment | —Unverified | 0 |
| A coupled autoencoder approach for multi-modal analysis of cell types | Nov 6, 2019 | Clusteringcross-modal alignment | CodeCode Available | 0 |
| ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching | Oct 1, 2019 | cross-modal alignmentSentence | —Unverified | 0 |
| Mix and match networks: cross-modal alignment for zero-pair image-to-image translation | Mar 8, 2019 | cross-modal alignmentDecoder | —Unverified | 0 |
| Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces | May 18, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |