| HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training | Dec 30, 2022 | cross-modal alignmentTGIF-Action | —Unverified | 0 |
| MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Dec 19, 2022 | cross-modal alignmentDenoising | CodeCode Available | 2 |
| SimVTP: Simple Video Text Pre-training with Masked Autoencoders | Dec 7, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| Asymmetric Cross-Scale Alignment for Text-Based Person Search | Nov 26, 2022 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning | Nov 24, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning? | Nov 23, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training | Nov 21, 2022 | cross-modal alignmentGPU | —Unverified | 0 |
| CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation | Nov 2, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 1 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding | Oct 21, 2022 | cross-modal alignmentSentence | —Unverified | 0 |
| CLIP-Driven Fine-grained Text-Image Person Re-identification | Oct 19, 2022 | cross-modal alignmentPerson Re-Identification | CodeCode Available | 1 |
| Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation | Oct 18, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval | Oct 17, 2022 | cross-modal alignmentObject | —Unverified | 0 |
| Low-resource Neural Machine Translation with Cross-modal Alignment | Oct 13, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning | Oct 12, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation | Oct 1, 2022 | cross-modal alignmentDisease Prediction | —Unverified | 0 |
| Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection | Sep 28, 2022 | 2D Object Detectioncross-modal alignment | —Unverified | 0 |
| TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval | Sep 28, 2022 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval | Sep 23, 2022 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| OmniVL:One Foundation Model for Image-Language and Video-Language Tasks | Sep 15, 2022 | Action ClassificationAction Recognition | —Unverified | 0 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity | Aug 7, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Fine-Grained Semantically Aligned Vision-Language Pre-Training | Aug 4, 2022 | cross-modal alignmentobject-detection | CodeCode Available | 1 |
| Masked Vision and Language Modeling for Multi-modal Representation Learning | Aug 3, 2022 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |