SOTAVerified

cross-modal alignment

Papers

Showing 76100 of 342 papers

TitleStatusHype
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal RepresentationsCode1
CoMP: Continual Multimodal Pre-training for Vision Foundation ModelsCode1
Language-based Image Colorization: A Benchmark and BeyondCode0
Shushing! Let's Imagine an Authentic Speech from the Silent Video0
Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation0
Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition0
4D-ACFNet: A 4D Attention Mechanism-Based Prognostic Framework for Colorectal Cancer Liver Metastasis Integrating Multimodal Spatiotemporal Features0
Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection0
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?0
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection0
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and ImagesCode3
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database SystemsCode0
Cross-modal Causal Relation Alignment for Video Question GroundingCode1
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data0
Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal0
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting0
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual GroundingCode1
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications0
MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language ModelCode0
CrossOver: 3D Scene Cross-Modal AlignmentCode3
CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological MeasurementCode0
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model0
A Survey of Automatic Prompt Engineering: An Optimization Perspective0
Phantom: Subject-consistent video generation via cross-modal alignmentCode5
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationCode3
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.