SOTAVerified

text-to-audiovisual retrieval

Papers

Showing 12 of 2 papers

TitleStatusHype
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
Show:102550

No leaderboard results yet.