SOTAVerified

cross-modal alignment

Papers

Showing 101110 of 342 papers

TitleStatusHype
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language NavigationCode1
RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation ModelsCode1
Mask Grounding for Referring Image SegmentationCode1
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image CaptioningCode1
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic SegmentationCode1
SOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationCode1
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
A Survey of Automatic Prompt Engineering: An Optimization Perspective0
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
Show:102550
← PrevPage 11 of 35Next →

No leaderboard results yet.