SOTAVerified

cross-modal alignment

Papers

Showing 251300 of 342 papers

TitleStatusHype
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models0
Temporal Order Preserved Optimal Transport-based Cross-modal Knowledge Transfer Learning for ASR0
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge0
TMCIR: Token Merge Benefits Composed Image Retrieval0
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection0
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images0
Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques0
Transformer-based Spatial Grounding: A Comprehensive Survey0
Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection0
TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation0
TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation0
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment0
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting0
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces0
Video Referring Expression Comprehension via Transformer with Content-aware Query0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix0
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering0
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation0
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction0
mSLAM: Massively multilingual joint pre-training for speech and text0
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision0
Multi-level Cross-modal Alignment for Image Clustering0
Multi-modal Attribute Prompting for Vision-Language Models0
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval0
Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges0
Multimodal Reasoning with Multimodal Knowledge Graph0
Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval0
Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification0
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training0
NeuroLIP: Interpretable and Fair Cross-Modal Alignment of fMRI and Phenotypic Text0
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training0
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model0
Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation0
OMCAT: Omni Context Aware Transformer0
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities0
On the Language Encoder of Contrastive Cross-modal Models0
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge BaseCode0
Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose EstimationCode0
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual ClassificationCode0
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal FusionCode0
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma GradingCode0
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action RecognitionCode0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.