SOTAVerified

cross-modal alignment

Papers

Showing 211220 of 342 papers

TitleStatusHype
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding0
Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms0
OMCAT: Omni Context Aware Transformer0
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal PerspectiveCode0
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment0
Intriguing Properties of Large Language and Vision Models0
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation0
Fully Aligned Network for Referring Image Segmentation0
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training0
TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models0
Show:102550
← PrevPage 22 of 35Next →

No leaderboard results yet.