SOTAVerified

Zero-shot Audio Captioning

Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without any prior training for this task. Audio captioning is commonly concerned with ambient sounds, or sounds produced by a human performing an action.

Papers

Showing 16 of 6 papers

TitleStatusHype
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue AbilitiesCode5
Zero-shot audio captioning with audio-language model guidance and audio context keywordsCode1
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution AlignmentCode0
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio CaptioningCode0
Classifier-Guided Captioning Across Modalities0
Zero-Shot Audio Captioning via Audibility Guidance0
Show:102550

No leaderboard results yet.