SOTAVerified

cross-modal alignment

Papers

Showing 226250 of 342 papers

TitleStatusHype
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection0
Navigating Open Set Scenarios for Skeleton-based Action RecognitionCode1
Progressive Multi-Modality Learning for Inverse Protein FoldingCode1
PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features0
DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation0
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
On the Language Encoder of Contrastive Cross-modal Models0
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation0
Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport FrameworkCode0
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object DetectionCode2
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented BenchmarksCode1
Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification0
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored SearchCode0
VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language ModelsCode1
Cross-modal Alignment with Optimal Transport for CTC-based ASR0
Sound Source Localization is All about Cross-Modal Alignment0
Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based Action RecognitionCode1
Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation0
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images0
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language ModelsCode1
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language NavigationCode1
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment0
Language-Guided Diffusion Model for Visual GroundingCode0
AerialVLN: Vision-and-Language Navigation for UAVsCode2
Show:102550
← PrevPage 10 of 14Next →

No leaderboard results yet.