SOTAVerified

Retrieval-augmented Few-shot In-context Audio Captioning

Retrieval-augmented few-shot in-context audio captioning is a specialized approach within the broader domain of audio captioning. This technique leverages the principles of few-shot in-context learning, akin to those used in LLMs, to generate textual descriptions for audio content without training on the dataset. Instead, during inference, the model utilizes a few-shot retrieval method where a few selected examples from the training data are presented in-context. This allows the model to generate accurate and contextually relevant captions based on limited input.

Title	Date	Tasks	Status	Hype
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities	Feb 2, 2024	Acoustic Scene ClassificationAudio captioning	CodeCode Available	5
RECAP: Retrieval-Augmented Audio Captioning	Sep 18, 2023	AudioCapsAudio captioning	CodeCode Available	1
Prefix tuning for automated audio captioning	Mar 30, 2023	AudioCapsAudio captioning	CodeCode Available	1
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS	Nov 15, 2021	AudioCapsAudio captioning	CodeCode Available	0
Audio Captioning Transformer	Jul 21, 2021	AudioCapsAudio captioning	CodeCode Available	1

Title

Status

Hype

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

CodeCode Available

RECAP: Retrieval-Augmented Audio Captioning

CodeCode Available

Prefix tuning for automated audio captioning

CodeCode Available

AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS

CodeCode Available

Audio Captioning Transformer

CodeCode Available

No leaderboard results yet.

Retrieval-augmented Few-shot In-context Audio Captioning

Papers