| A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Apr 22, 2025 | cross-modal alignmentScript Generation | —Unverified | 0 |
| Cross-attention for State-based model RWKV-7 | Apr 19, 2025 | cross-modal alignmentImage Generation | —Unverified | 0 |
| TMCIR: Token Merge Benefits Composed Image Retrieval | Apr 15, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 |
| InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals | Apr 13, 2025 | cross-modal alignmentSelf-Supervised Learning | —Unverified | 0 |
| VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | Apr 11, 2025 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity | Apr 8, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification | Apr 8, 2025 | cross-modal alignmentImage Classification | CodeCode Available | 0 |
| Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval | Apr 2, 2025 | cross-modal alignmentRetrieval | —Unverified | 0 |
| DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow | Apr 2, 2025 | Autonomous DrivingCamera Calibration | —Unverified | 0 |
| COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking | Apr 2, 2025 | cross-modal alignmentObject | —Unverified | 0 |
| FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs | Apr 2, 2025 | cross-modal alignmentCross-Modal Retrieval | —Unverified | 0 |
| SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering | Apr 1, 2025 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation | Mar 30, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| NeuroLIP: Interpretable and Fair Cross-Modal Alignment of fMRI and Phenotypic Text | Mar 27, 2025 | AttributeContrastive Learning | —Unverified | 0 |
| AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction | Mar 26, 2025 | Computed Tomography (CT)cross-modal alignment | —Unverified | 0 |
| GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations | Mar 26, 2025 | cross-modal alignmentEmotion Classification | —Unverified | 0 |
| LangBridge: Interpreting Image as a Combination of Language Embeddings | Mar 25, 2025 | cross-modal alignment | —Unverified | 0 |
| Language-based Image Colorization: A Benchmark and Beyond | Mar 19, 2025 | BenchmarkingColorization | CodeCode Available | 0 |
| Shushing! Let's Imagine an Authentic Speech from the Silent Video | Mar 19, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation | Mar 14, 2025 | cross-modal alignmentNavigate | —Unverified | 0 |
| Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition | Mar 13, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| 4D-ACFNet: A 4D Attention Mechanism-Based Prognostic Framework for Colorectal Cancer Liver Metastasis Integrating Multimodal Spatiotemporal Features | Mar 12, 2025 | cross-modal alignmentDisentanglement | —Unverified | 0 |
| Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection | Mar 10, 2025 | 3D Object Detectioncross-modal alignment | —Unverified | 0 |
| LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? | Mar 10, 2025 | cross-modal alignment | —Unverified | 0 |
| OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection | Mar 9, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems | Mar 6, 2025 | cross-modal alignment | CodeCode Available | 0 |
| Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data | Mar 3, 2025 | cross-modal alignmentStyle Transfer | —Unverified | 0 |
| Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal | Mar 1, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Feb 25, 2025 | 3DGScross-modal alignment | —Unverified | 0 |
| DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Feb 24, 2025 | cross-modal alignmentEarth Observation | —Unverified | 0 |
| MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model | Feb 23, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 0 |
| CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement | Feb 19, 2025 | cross-modal alignmentFairness | CodeCode Available | 0 |
| NOTA: Multimodal Music Notation Understanding for Visual Large Language Model | Feb 17, 2025 | cross-modal alignmentLanguage Modeling | —Unverified | 0 |
| A Survey of Automatic Prompt Engineering: An Optimization Perspective | Feb 17, 2025 | cross-modal alignmentPrompt Engineering | —Unverified | 0 |
| MDE: Modality Discrimination Enhancement for Multi-modal Recommendation | Feb 8, 2025 | cross-modal alignmentMulti-modal Recommendation | —Unverified | 0 |
| Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion | Feb 7, 2025 | class-incremental learningClass Incremental Learning | —Unverified | 0 |
| Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition | Jan 25, 2025 | cross-modal alignmentEmotion Classification | —Unverified | 0 |
| Integrate Temporal Graph Learning into LLM-based Temporal Knowledge Graph Model | Jan 21, 2025 | cross-modal alignmentGraph Embedding | —Unverified | 0 |
| CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | Jan 8, 2025 | Computational Efficiencycross-modal alignment | —Unverified | 0 |
| Audio-Visual Semantic Graph Network for Audio-Visual Event Localization | Jan 1, 2025 | audio-visual event localizationcross-modal alignment | —Unverified | 0 |
| Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation | Jan 1, 2025 | Classificationcross-modal alignment | —Unverified | 0 |
| Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment | Jan 1, 2025 | Attributecross-modal alignment | —Unverified | 0 |
| Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment | Dec 30, 2024 | cross-modal alignmentEmotion Recognition | —Unverified | 0 |
| ChartAdapter: Large Vision-Language Model for Chart Summarization | Dec 30, 2024 | Chart Understandingcross-modal alignment | —Unverified | 0 |
| Enhancing Visual Representation for Text-based Person Searching | Dec 30, 2024 | cross-modal alignmentPerson Search | CodeCode Available | 0 |
| Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data | Dec 19, 2024 | AutoMLcross-modal alignment | —Unverified | 0 |
| RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models | Dec 15, 2024 | Autonomous DrivingContrastive Learning | —Unverified | 0 |
| Wearable Accelerometer Foundation Models for Health via Knowledge Distillation | Dec 15, 2024 | Activity Recognitioncross-modal alignment | —Unverified | 0 |
| Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction | Dec 13, 2024 | cross-modal alignmentPrediction | —Unverified | 0 |