SOTAVerified

Audio captioning

Audio Captioning is the task of describing audio using text. The general approach is to use an audio encoder to encode the audio (example: PANN, CAV-MAE), and to use a decoder (example: transformer) to generate the text. To judge the quality of audio captions, though machine translation metrics (BLEU, METEOR, ROUGE) and image captioning metrics (SPICE, CIDER) are used, they are not very well-suited. Attempts have been made to use pretrained language model based metrics such as Sentence-BERT.

Papers

Showing 76100 of 119 papers

TitleStatusHype
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features0
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity0
Audio Retrieval with WavText5K and CLAP TrainingCode1
Language-based Audio Retrieval Task in DCASE 2022 Challenge0
An investigation on selecting audio pre-trained models for audio captioning0
Automated Audio Captioning and Language-Based Audio RetrievalCode0
Language-based Audio Retrieval Task in DCASE 2022 ChallengeCode0
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning0
Multimodal Knowledge Alignment with Reinforcement LearningCode1
Automated Audio Captioning: An Overview of Recent Progress and New Challenges0
Caption Feature Space Regularization for Audio CaptioningCode0
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning0
Leveraging Pre-trained BERT for Audio Captioning0
Joint Speech Recognition and Audio Captioning0
Automatic Audio Captioning using Attention weighted Event based Embeddings0
Local Information Assisted Attention-free Decoder for Audio CaptioningCode0
Audio Retrieval with Natural Language Queries: A Benchmark StudyCode1
AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGSCode0
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning0
Diverse Audio Captioning via Adversarial Training0
Can Audio Captions Be Evaluated with Image Caption Metrics?Code1
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization0
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement LearningCode1
Audio Captioning TransformerCode1
CL4AC: A Contrastive Loss for Audio CaptioningCode1
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VASTCIDEr0.78Unverified
2VALORCIDEr0.74Unverified
3MQ-CapSPIDEr0.52Unverified
4SLAM-AACSPIDEr0.52Unverified
5LAVCapSPIDEr0.52Unverified
6EnCLAP++-largeSPIDEr0.51Unverified
7AutoCapSPIDEr0.51Unverified
8LOAESPIDEr0.51Unverified
9EnCLAP++-baseSPIDEr0.5Unverified
10EnCLAP-largeSPIDEr0.5Unverified
#ModelMetricClaimedVerifiedStatus
1VASTCIDEr0.52Unverified
2VALORCIDEr0.42Unverified
3SLAM-AACSPIDEr0.33Unverified
4LOAESPIDEr0.33Unverified
5MQ-CapSPIDEr0.32Unverified
6EnsembleSPIDEr0.32Unverified
7Audio Flamingo (Pengi trainset)SPIDEr0.31Unverified
8Ensemble-RLSPIDEr0.3Unverified
9Qwen-AudioSPIDEr0.29Unverified
10EnsembleSPIDEr0.21Unverified