| Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation | Aug 2, 2023 | cross-modal alignmentDenoising | CodeCode Available | 0 |
| Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval | Jul 18, 2023 | cross-modal alignmentData Augmentation | CodeCode Available | 1 |
| WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation | Jun 19, 2023 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models | Jun 15, 2023 | cross-modal alignmentDomain Generalization | —Unverified | 0 |
| Global and Local Semantic Completion Learning for Vision-Language Pre-training | Jun 12, 2023 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning | May 31, 2023 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 |
| SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation | May 26, 2023 | cross-modal alignmentObject | CodeCode Available | 1 |
| Improving speech translation by fusing speech and text | May 23, 2023 | cross-modal alignmentMachine Translation | —Unverified | 0 |
| Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment | May 19, 2023 | cross-modal alignmentEmotion Recognition in Conversation | —Unverified | 0 |
| Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training | May 13, 2023 | cross-modal alignment | —Unverified | 0 |
| AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment | May 8, 2023 | cross-modal alignmentRhythm | —Unverified | 0 |
| Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining | Apr 26, 2023 | cross-modal alignmentMedical Visual Question Answering | CodeCode Available | 1 |
| CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval | Apr 15, 2023 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger | Mar 30, 2023 | cross-modal alignmentzero-shot-classification | —Unverified | 0 |
| Unmasked Teacher: Towards Training-Efficient Video Foundation Models | Mar 28, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens | Mar 27, 2023 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment | Mar 10, 2023 | cross-modal alignmentSign Language Recognition | CodeCode Available | 1 |
| LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion | Mar 7, 2023 | 3D Object Detectioncross-modal alignment | CodeCode Available | 0 |
| HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention | Mar 6, 2023 | cross-modal alignment | CodeCode Available | 1 |
| TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection | Feb 27, 2023 | cross-modal alignment | —Unverified | 0 |
| End-to-end Semantic Object Detection with Cross-Modal Alignment | Feb 10, 2023 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? | Feb 1, 2023 | cross-modal alignmentLanguage Acquisition | —Unverified | 0 |
| Improving Cross-modal Alignment for Text-Guided Image Inpainting | Jan 26, 2023 | cross-modal alignmentImage Inpainting | —Unverified | 0 |
| Linguistic Query-Guided Mask Generation for Referring Image Segmentation | Jan 16, 2023 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training | Dec 30, 2022 | cross-modal alignmentTGIF-Action | —Unverified | 0 |
| MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Dec 19, 2022 | cross-modal alignmentDenoising | CodeCode Available | 2 |
| SimVTP: Simple Video Text Pre-training with Masked Autoencoders | Dec 7, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| Asymmetric Cross-Scale Alignment for Text-Based Person Search | Nov 26, 2022 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning | Nov 24, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning? | Nov 23, 2022 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training | Nov 21, 2022 | cross-modal alignmentGPU | —Unverified | 0 |
| CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation | Nov 2, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 1 |
| Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision | Oct 24, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding | Oct 21, 2022 | cross-modal alignmentSentence | —Unverified | 0 |
| CLIP-Driven Fine-grained Text-Image Person Re-identification | Oct 19, 2022 | cross-modal alignmentPerson Re-Identification | CodeCode Available | 1 |
| Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation | Oct 18, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval | Oct 17, 2022 | cross-modal alignmentObject | —Unverified | 0 |
| Low-resource Neural Machine Translation with Cross-modal Alignment | Oct 13, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning | Oct 12, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation | Oct 1, 2022 | cross-modal alignmentDisease Prediction | —Unverified | 0 |
| Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection | Sep 28, 2022 | 2D Object Detectioncross-modal alignment | —Unverified | 0 |
| TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval | Sep 28, 2022 | cross-modal alignmentRetrieval | —Unverified | 0 |
| Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval | Sep 23, 2022 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| OmniVL:One Foundation Model for Image-Language and Video-Language Tasks | Sep 15, 2022 | Action ClassificationAction Recognition | —Unverified | 0 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity | Aug 7, 2022 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Fine-Grained Semantically Aligned Vision-Language Pre-Training | Aug 4, 2022 | cross-modal alignmentobject-detection | CodeCode Available | 1 |
| Masked Vision and Language Modeling for Multi-modal Representation Learning | Aug 3, 2022 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |