SOTAVerified

cross-modal alignment

Papers

Showing 6170 of 342 papers

TitleStatusHype
SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity0
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual ClassificationCode0
Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionCode1
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval0
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking0
DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow0
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering0
CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation0
BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image SegmentationCode1
Show:102550
← PrevPage 7 of 35Next →

No leaderboard results yet.