| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 |
| LESS: Label-Efficient and Single-Stage Referring 3D Segmentation | Oct 17, 2024 | cross-modal alignmentInstance Segmentation | CodeCode Available | 1 |
| Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners | Oct 3, 2024 | cross-modal alignment | CodeCode Available | 1 |
| MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Sep 20, 2024 | cross-modal alignmentReferring Expression | CodeCode Available | 1 |
| Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding | Aug 29, 2024 | cross-modal alignmentDeep Learning | CodeCode Available | 1 |
| A Survey on Facial Expression Recognition of Static and Dynamic Emotions | Aug 28, 2024 | cross-modal alignmentFacial Expression Recognition | CodeCode Available | 1 |
| Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training | Aug 15, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment | Jul 18, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation | Jul 7, 2024 | cross-modal alignmentMulti-modal Recommendation | CodeCode Available | 1 |
| PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes | Jun 19, 2024 | cross-modal alignment | CodeCode Available | 1 |
| MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction | Jun 7, 2024 | cross-modal alignmentPrediction | CodeCode Available | 1 |
| Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval | May 29, 2024 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation | May 23, 2024 | cross-modal alignmentDecoder | CodeCode Available | 1 |
| Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation | May 15, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition | Mar 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation | Feb 29, 2024 | cross-modal alignmentMultimodal Recommendation | CodeCode Available | 1 |
| The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation | Jan 16, 2024 | cross-modal alignmentfeature selection | CodeCode Available | 1 |
| Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment | Dec 25, 2023 | cross-modal alignmentDecoder | CodeCode Available | 1 |
| BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction | Dec 22, 2023 | cross-modal alignmentEEG | CodeCode Available | 1 |
| Mask Grounding for Referring Image Segmentation | Dec 19, 2023 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 |
| Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval | Dec 19, 2023 | cross-modal alignmentMoment Retrieval | CodeCode Available | 1 |
| ViLA: Efficient Video-Language Alignment for Video Question Answering | Dec 13, 2023 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Progressive Multi-Modality Learning for Inverse Protein Folding | Dec 11, 2023 | cross-modal alignmentData Augmentation | CodeCode Available | 1 |
| Navigating Open Set Scenarios for Skeleton-based Action Recognition | Dec 11, 2023 | Action RecognitionActivity Recognition | CodeCode Available | 1 |
| ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks | Oct 4, 2023 | cross-modal alignment | CodeCode Available | 1 |
| VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models | Sep 28, 2023 | Backdoor Attackcross-modal alignment | CodeCode Available | 1 |
| Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based Action Recognition | Sep 18, 2023 | Action Recognitioncross-modal alignment | CodeCode Available | 1 |
| Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models | Aug 25, 2023 | cross-modal alignmentPosition | CodeCode Available | 1 |
| Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation | Aug 24, 2023 | cross-modal alignmentDescriptive | CodeCode Available | 1 |
| Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval | Jul 18, 2023 | cross-modal alignmentData Augmentation | CodeCode Available | 1 |
| Global and Local Semantic Completion Learning for Vision-Language Pre-training | Jun 12, 2023 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning | May 31, 2023 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 |
| SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation | May 26, 2023 | cross-modal alignmentObject | CodeCode Available | 1 |
| Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining | Apr 26, 2023 | cross-modal alignmentMedical Visual Question Answering | CodeCode Available | 1 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens | Mar 27, 2023 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment | Mar 10, 2023 | cross-modal alignmentSign Language Recognition | CodeCode Available | 1 |
| HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention | Mar 6, 2023 | cross-modal alignment | CodeCode Available | 1 |
| Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning | Nov 24, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation | Nov 2, 2022 | cross-modal alignmentDecision Making | CodeCode Available | 1 |
| CLIP-Driven Fine-grained Text-Image Person Re-identification | Oct 19, 2022 | cross-modal alignmentPerson Re-Identification | CodeCode Available | 1 |
| Low-resource Neural Machine Translation with Cross-modal Alignment | Oct 13, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning | Oct 12, 2022 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Aug 29, 2022 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Fine-Grained Semantically Aligned Vision-Language Pre-Training | Aug 4, 2022 | cross-modal alignmentobject-detection | CodeCode Available | 1 |
| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | Jun 17, 2022 | cross-modal alignmentRepresentation Learning | CodeCode Available | 1 |
| mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections | May 24, 2022 | Computational Efficiencycross-modal alignment | CodeCode Available | 1 |
| DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors | Apr 6, 2022 | 3D geometry3D Object Detection | CodeCode Available | 1 |
| Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding | Apr 4, 2022 | cross-modal alignmentNatural Language Queries | CodeCode Available | 1 |