| Cross-Modal Alignment Learning of Vision-Language Conceptual Systems | Jul 31, 2022 | cross-modal alignmentRepresentation Learning | —Unverified | 0 |
| A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues | Jul 24, 2022 | cross-modal alignmentTrajectory Planning | CodeCode Available | 0 |
| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | Jun 17, 2022 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 |
| VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix | Jun 17, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 |
| Reinforced Cross-modal Alignment for Radiology Report Generation | May 1, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors | Apr 6, 2022 | 3D geometry3D Object Detection | CodeCode Available | 1 |
| Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding | Apr 4, 2022 | cross-modal alignmentNatural Language Queries | CodeCode Available | 1 |
| Vision-Language Pre-Training with Triple Contrastive Learning | Feb 21, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 2 |
| mSLAM: Massively multilingual joint pre-training for speech and text | Feb 3, 2022 | cross-modal alignmentintent-classification | —Unverified | 0 |
| ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding | Jan 16, 2022 | cross-modal alignmentDocument Classification | CodeCode Available | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Jan 16, 2022 | cross-modal alignmentKnowledge Distillation | —Unverified | 0 |
| Align and Prompt: Video-and-Language Pre-training with Entity Prompts | Dec 17, 2021 | cross-modal alignmentEntity Alignment | CodeCode Available | 1 |
| Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision | Dec 1, 2021 | cross-modal alignmentNavigate | CodeCode Available | 1 |
| Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision | Sep 29, 2021 | cross-modal alignmentobject-detection | —Unverified | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Sep 22, 2021 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 0 |
| Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images | Aug 9, 2021 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval | Aug 5, 2021 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation | Jun 21, 2021 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 1 |
| Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information | Apr 19, 2021 | cross-modal alignmentNavigate | CodeCode Available | 0 |
| Continual learning in cross-modal retrieval | Apr 14, 2021 | Continual Learningcross-modal alignment | —Unverified | 0 |
| Scene-Intuitive Agent for Remote Embodied Visual Grounding | Mar 24, 2021 | cross-modal alignmentNavigate | —Unverified | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags | Oct 27, 2020 | cross-modal alignmentRepresentation Learning | CodeCode Available | 0 |
| ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding | Oct 23, 2020 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos | Sep 18, 2020 | cross-modal alignmentreinforcement-learning | —Unverified | 0 |
| Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation | Sep 17, 2020 | cross-modal alignmentImage to text | —Unverified | 0 |
| DanceIt: Music-inspired Dancing Video Synthesis | Sep 17, 2020 | cross-modal alignmentRhythm | CodeCode Available | 1 |
| Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation | Aug 4, 2020 | 2D Pose Estimation3D Human Pose Estimation | CodeCode Available | 0 |
| Symbiotic Adversarial Learning for Attribute-based Person Search | Jul 19, 2020 | Attributecross-modal alignment | CodeCode Available | 1 |
| Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm | Jun 3, 2020 | cross-modal alignmentGeneral Classification | —Unverified | 0 |
| Cross-Modal Cross-Domain Moment Alignment Network for Person Search | Jun 1, 2020 | cross-modal alignmentPerson Search | —Unverified | 0 |
| Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models | May 15, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space | May 11, 2020 | cross-modal alignmentDecoder | —Unverified | 0 |
| MCQA: Multimodal Co-attention Based Network for Question Answering | Apr 25, 2020 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| Curriculum Audiovisual Learning | Jan 26, 2020 | Clusteringcross-modal alignment | —Unverified | 0 |
| A coupled autoencoder approach for multi-modal analysis of cell types | Nov 6, 2019 | Clusteringcross-modal alignment | CodeCode Available | 0 |
| ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching | Oct 1, 2019 | cross-modal alignmentSentence | —Unverified | 0 |
| Mix and match networks: cross-modal alignment for zero-pair image-to-image translation | Mar 8, 2019 | cross-modal alignmentDecoder | —Unverified | 0 |
| Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces | May 18, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |