SOTAVerified

cross-modal alignment

Papers

Showing 326342 of 342 papers

TitleStatusHype
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities0
On the Language Encoder of Contrastive Cross-modal Models0
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection0
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection0
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing0
PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features0
Prompt-based Context- and Domain-aware Pretraining for Vision and Language Navigation0
Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification0
RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models0
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos0
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval0
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models0
Revisiting Misalignment in Multispectral Pedestrian Detection: A Language-Driven Approach for Cross-modal Alignment Fusion0
Scene-Intuitive Agent for Remote Embodied Visual Grounding0
SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity0
Show:102550
← PrevPage 14 of 14Next →

No leaderboard results yet.