SOTAVerified

Zero-Shot Environment Sound Classification

Papers

Showing 16 of 6 papers

TitleStatusHype
ImageBind: One Embedding Space To Bind Them AllCode5
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
AudioCLIP: Extending CLIP to Image, Text and AudioCode2
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal ResearchCode2
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
Show:102550

No leaderboard results yet.