cross-modal alignment

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 342 papers

Title	Date	Tasks	Status	Hype
Skywork-R1V3 Technical Report	Jul 8, 2025	cross-modal alignmentMathematical Reasoning	CodeCode Available	7
Phantom: Subject-consistent video generation via cross-modal alignment	Feb 16, 2025	cross-modal alignmentHuman-Domain Subject-to-Video	CodeCode Available	5
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation	Feb 12, 2025	cross-modal alignmentmultimodal generation	CodeCode Available	3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Jun 12, 2024	cross-modal alignmentLanguage Modelling	CodeCode Available	3
Ola: Pushing the Frontiers of Omni-Modal Language Model	Feb 6, 2025	cross-modal alignmentLanguage Modeling	CodeCode Available	3
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection	Jun 2, 2024	3D Object Detectioncross-modal alignment	CodeCode Available	3
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning	May 15, 2025	cross-modal alignmentGeometry Problem Solving	CodeCode Available	3
CrossOver: 3D Scene Cross-Modal Alignment	Feb 20, 2025	cross-modal alignmentObject	CodeCode Available	3
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images	Mar 8, 2025	cross-modal alignmentDiagnostic	CodeCode Available	3
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams	Jun 30, 2025	cross-modal alignmentEgoSchema	CodeCode Available	3
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation	Dec 19, 2022	cross-modal alignmentDenoising	CodeCode Available	2
AerialVLN: Vision-and-Language Navigation for UAVs	Aug 13, 2023	cross-modal alignmentNavigate	CodeCode Available	2
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection	Oct 4, 2023	3D Object Detectioncross-modal alignment	CodeCode Available	2
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	Feb 12, 2025	cross-modal alignmentLarge Language Model	CodeCode Available	2
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP	Jun 25, 2024	cross-modal alignmentImage Classification	CodeCode Available	2
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild	Apr 13, 2024	cross-modal alignmentDynamic Facial Expression Recognition	CodeCode Available	2
Law of Vision Representation in MLLMs	Aug 29, 2024	cross-modal alignmentLanguage Modeling	CodeCode Available	2
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation	Jan 2, 2024	Audio Generationcross-modal alignment	CodeCode Available	2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment	Jan 1, 2024	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	2
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate	Oct 9, 2024	cross-modal alignmentVisual Question Answering	CodeCode Available	2
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models	May 31, 2024	cross-modal alignmentVisual Localization	CodeCode Available	2
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding	Apr 20, 2024	cross-modal alignmentVisual Grounding	CodeCode Available	2
Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach	Aug 2, 2024	cross-modal alignmentMultiple Object Tracking	CodeCode Available	2
Melody-Guided Music Generation	Sep 30, 2024	cross-modal alignmentMusic Generation	CodeCode Available	2
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment	Jul 3, 2025	cross-modal alignmentInstruction Following	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 14Next →

No leaderboard results yet.