SOTAVerified

Retrieval-augmented Few-shot In-context Audio Captioning

Retrieval-augmented few-shot in-context audio captioning is a specialized approach within the broader domain of audio captioning. This technique leverages the principles of few-shot in-context learning, akin to those used in LLMs, to generate textual descriptions for audio content without training on the dataset. Instead, during inference, the model utilizes a few-shot retrieval method where a few selected examples from the training data are presented in-context. This allows the model to generate accurate and contextually relevant captions based on limited input.

Papers

Showing 15 of 5 papers

TitleStatusHype
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue AbilitiesCode5
RECAP: Retrieval-Augmented Audio CaptioningCode1
Prefix tuning for automated audio captioningCode1
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGSCode0
Audio Captioning TransformerCode1
Show:102550

No leaderboard results yet.