SOTAVerified

Audio-Visual Captioning

Papers

Showing 14 of 4 papers

TitleStatusHype
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
AVCap: Leveraging Audio-Visual Features as Text Tokens for CaptioningCode1
LAVCap: LLM-based Audio-Visual Captioning using Optimal TransportCode1
Show:102550

No leaderboard results yet.