SOTAVerified

cross-modal alignment

Papers

Showing 201225 of 342 papers

TitleStatusHype
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving0
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models0
A Survey of Automatic Prompt Engineering: An Optimization Perspective0
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization0
AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction0
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data0
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models0
Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation0
Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection0
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation0
CADFormer: Fine-Grained Cross-modal Alignment and Decoding Transformer for Referring Remote Sensing Image Segmentation0
CATVis: Context-Aware Thought Visualization0
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection0
ChartAdapter: Large Vision-Language Model for Chart Summarization0
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment0
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling0
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation0
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
Coarse-to-fine Alignment Makes Better Speech-image Retrieval0
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection0
Context-Enhanced Video Moment Retrieval with Large Language Models0
Continual learning in cross-modal retrieval0
Continuous Sign Language Recognition Through Cross-Modal Alignment of Video and Text Embeddings in a Joint-Latent Space0
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking0
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.