SOTAVerified

cross-modal alignment

Papers

Showing 5175 of 342 papers

TitleStatusHype
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable ModelsCode0
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing0
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentCode1
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models0
Cross-attention for State-based model RWKV-7Code0
TMCIR: Token Merge Benefits Composed Image Retrieval0
InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering0
SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity0
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual ClassificationCode0
Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionCode1
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval0
COST: Contrastive One-Stage Transformer for Vision-Language Small Object TrackingCode0
DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow0
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering0
CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation0
BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image SegmentationCode1
NeuroLIP: Interpretable and Fair Cross-Modal Alignment of fMRI and Phenotypic Text0
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations0
AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction0
LangBridge: Interpreting Image as a Combination of Language Embeddings0
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic SegmentationCode1
Show:102550
← PrevPage 3 of 14Next →

No leaderboard results yet.