SOTAVerified

cross-modal alignment

Papers

Showing 2130 of 342 papers

TitleStatusHype
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual GroundingCode2
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wildCode2
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio GenerationCode2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionCode2
AerialVLN: Vision-and-Language Navigation for UAVsCode2
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video GenerationCode2
Vision-Language Pre-Training with Triple Contrastive LearningCode2
RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation ModelsCode1
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information RetrievalCode1
Show:102550
← PrevPage 3 of 35Next →

No leaderboard results yet.