SOTAVerified

Audio-Visual Captioning

Papers

Showing 14 of 4 papers

TitleStatusHype
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
LAVCap: LLM-based Audio-Visual Captioning using Optimal TransportCode1
AVCap: Leveraging Audio-Visual Features as Text Tokens for CaptioningCode1
Show:102550

No leaderboard results yet.