| Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training | Sep 25, 2024 | Classificationcross-modal alignment | —Unverified | 0 |
| TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models | Sep 23, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment | Sep 22, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Sep 20, 2024 | cross-modal alignmentReferring Expression | CodeCode Available | 1 |
| OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities | Sep 17, 2024 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| CAST: Cross-modal Alignment Similarity Test for Vision Language Models | Sep 17, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph | Sep 17, 2024 | cross-modal alignmentImage Captioning | CodeCode Available | 0 |
| NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training | Sep 15, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization | Sep 12, 2024 | cross-modal alignment | —Unverified | 0 |
| GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | Sep 6, 2024 | cross-modal alignmentLanguage Modelling | —Unverified | 0 |
| Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR | Sep 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Law of Vision Representation in MLLMs | Aug 29, 2024 | cross-modal alignmentLanguage Modeling | CodeCode Available | 2 |
| Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding | Aug 29, 2024 | cross-modal alignmentDeep Learning | CodeCode Available | 1 |
| A Survey on Facial Expression Recognition of Static and Dynamic Emotions | Aug 28, 2024 | cross-modal alignmentFacial Expression Recognition | CodeCode Available | 1 |
| Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading | Aug 16, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 0 |
| Coarse-to-fine Alignment Makes Better Speech-image Retrieval | Aug 15, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 |
| Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval | Aug 15, 2024 | cross-modal alignmentDenoising | —Unverified | 0 |
| Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training | Aug 15, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation | Aug 14, 2024 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| Disentangled Noisy Correspondence Learning | Aug 10, 2024 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Aug 2, 2024 | cross-modal alignmentMultiple Object Tracking | CodeCode Available | 2 |
| Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Jul 26, 2024 | cross-modal alignmentimage-classification | —Unverified | 0 |
| DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction | Jul 25, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 0 |
| Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges | Jul 23, 2024 | cross-modal alignmentFairness | —Unverified | 0 |
| Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning | Jul 22, 2024 | cross-modal alignment | CodeCode Available | 0 |
| Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment | Jul 18, 2024 | cross-modal alignmentCross-Modal Retrieval | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework | Jul 12, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| EA-VTR: Event-Aware Video-Text Retrieval | Jul 10, 2024 | Action RecognitionContrastive Learning | —Unverified | 0 |
| Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation | Jul 7, 2024 | cross-modal alignmentMulti-modal Recommendation | CodeCode Available | 1 |
| Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval | Jul 1, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 |
| MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval | Jun 25, 2024 | cross-modal alignmentMoment Retrieval | —Unverified | 0 |
| Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP | Jun 25, 2024 | cross-modal alignmentImage Classification | CodeCode Available | 2 |
| PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes | Jun 19, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | Jun 12, 2024 | cross-modal alignmentLanguage Modelling | CodeCode Available | 3 |
| It is Never Too Late to Mend: Separate Learning for Multimedia Recommendation | Jun 12, 2024 | cross-modal alignmentMultimedia recommendation | CodeCode Available | 0 |
| MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction | Jun 7, 2024 | cross-modal alignmentPrediction | CodeCode Available | 1 |
| Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching | Jun 5, 2024 | cross-modal alignmentImage-text matching | —Unverified | 0 |
| Multimodal Reasoning with Multimodal Knowledge Graph | Jun 4, 2024 | cross-modal alignmentGraph Attention | —Unverified | 0 |
| Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection | Jun 2, 2024 | 3D Object Detectioncross-modal alignment | CodeCode Available | 3 |
| DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | May 31, 2024 | cross-modal alignmentVisual Localization | CodeCode Available | 2 |
| Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval | May 29, 2024 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment | May 28, 2024 | cross-modal alignment | CodeCode Available | 2 |
| OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All | May 25, 2024 | Allcross-modal alignment | —Unverified | 0 |
| Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation | May 23, 2024 | cross-modal alignmentDecoder | CodeCode Available | 1 |
| AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | May 23, 2024 | cross-modal alignmentLanguage Modelling | —Unverified | 0 |
| Context-Enhanced Video Moment Retrieval with Large Language Models | May 21, 2024 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation | May 15, 2024 | Contrastive Learningcross-modal alignment | CodeCode Available | 1 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding | Apr 20, 2024 | cross-modal alignmentVisual Grounding | CodeCode Available | 2 |