SOTAVerified

cross-modal alignment

Papers

Showing 201250 of 342 papers

TitleStatusHype
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization0
AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction0
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data0
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models0
Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation0
Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection0
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation0
CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation0
CATVis: Context-Aware Thought Visualization0
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection0
ChartAdapter: Large Vision-Language Model for Chart Summarization0
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment0
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling0
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation0
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection0
Context-Enhanced Video Moment Retrieval with Large Language Models0
Continual learning in cross-modal retrieval0
Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space0
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval0
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation0
Cross-Modal Alignment Learning of Vision-Language Conceptual Systems0
Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation0
Cross-modal Alignment with Optimal Transport for CTC-based ASR0
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval0
Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition0
Cross-Modal Cross-Domain Moment Alignment Network for Person Search0
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval0
Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality0
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval0
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis0
Curriculum Audiovisual Learning0
DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning0
DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation0
Towards Brain Passage Retrieval -- An Investigation of EEG Query Representations0
Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model0
Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing0
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding0
Detection-based Intermediate Supervision for Visual Question Answering0
DF-Calib: Targetless LiDAR-Camera Calibration via Depth Flow0
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment0
DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models0
Disentangled Noisy Correspondence Learning0
Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?0
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition0
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications0
Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction0
EA-VTR: Event-Aware Video-Text Retrieval0
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.