| Cross-Modal Alignment Learning of Vision-Language Conceptual Systems | Jul 31, 2022 | cross-modal alignmentRepresentation Learning | —Unverified | 0 |
| A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues | Jul 24, 2022 | cross-modal alignmentTrajectory Planning | CodeCode Available | 0 |
| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | Jun 17, 2022 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 |
| VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix | Jun 17, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 |
| Reinforced Cross-modal Alignment for Radiology Report Generation | May 1, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 0 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | Apr 18, 2022 | cross-modal alignmentDocument AI | CodeCode Available | 0 |
| DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors | Apr 6, 2022 | 3D geometry3D Object Detection | CodeCode Available | 1 |
| Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding | Apr 4, 2022 | cross-modal alignmentNatural Language Queries | CodeCode Available | 1 |
| Vision-Language Pre-Training with Triple Contrastive Learning | Feb 21, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 2 |
| mSLAM: Massively multilingual joint pre-training for speech and text | Feb 3, 2022 | cross-modal alignmentintent-classification | —Unverified | 0 |
| ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding | Jan 16, 2022 | cross-modal alignmentDocument Classification | CodeCode Available | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Jan 16, 2022 | cross-modal alignmentKnowledge Distillation | —Unverified | 0 |
| Align and Prompt: Video-and-Language Pre-training with Entity Prompts | Dec 17, 2021 | cross-modal alignmentEntity Alignment | CodeCode Available | 1 |
| Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision | Dec 1, 2021 | cross-modal alignmentNavigate | CodeCode Available | 1 |
| Learning Better Visual Representations for Weakly-Supervised Object Detection Using Natural Language Supervision | Sep 29, 2021 | cross-modal alignmentobject-detection | —Unverified | 0 |
| KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation | Sep 22, 2021 | cross-modal alignmentKnowledge Distillation | CodeCode Available | 0 |
| Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images | Aug 9, 2021 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval | Aug 5, 2021 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Dynamic Modality Interaction Modeling for Image-Text Retrieval | Jul 11, 2021 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation | Jun 21, 2021 | 3D Semantic SegmentationAutonomous Driving | CodeCode Available | 1 |
| Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information | Apr 19, 2021 | cross-modal alignmentNavigate | CodeCode Available | 0 |
| Continual learning in cross-modal retrieval | Apr 14, 2021 | Continual Learningcross-modal alignment | —Unverified | 0 |
| Scene-Intuitive Agent for Remote Embodied Visual Grounding | Mar 24, 2021 | cross-modal alignmentNavigate | —Unverified | 0 |
| Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze | Nov 9, 2020 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |