SOTAVerified

Audio captioning

Audio Captioning is the task of describing audio using text. The general approach is to use an audio encoder to encode the audio (example: PANN, CAV-MAE), and to use a decoder (example: transformer) to generate the text. To judge the quality of audio captions, though machine translation metrics (BLEU, METEOR, ROUGE) and image captioning metrics (SPICE, CIDER) are used, they are not very well-suited. Attempts have been made to use pretrained language model based metrics such as Sentence-BERT.

Papers

Showing 3140 of 119 papers

TitleStatusHype
Towards Diverse and Efficient Audio Captioning via Diffusion Models0
Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models0
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning PerformanceCode2
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning0
Taming Data and Transformers for Audio GenerationCode2
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio EncodingCode1
Improving Text-To-Audio Models with Synthetic CaptionsCode5
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language ModelsCode2
Audio Dialogues: Dialogues dataset for audio and music understanding0
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs0
Show:102550
← PrevPage 4 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VASTCIDEr0.78Unverified
2VALORCIDEr0.74Unverified
3MQ-CapSPIDEr0.52Unverified
4SLAM-AACSPIDEr0.52Unverified
5LAVCapSPIDEr0.52Unverified
6EnCLAP++-largeSPIDEr0.51Unverified
7AutoCapSPIDEr0.51Unverified
8LOAESPIDEr0.51Unverified
9EnCLAP++-baseSPIDEr0.5Unverified
10EnCLAP-largeSPIDEr0.5Unverified
#ModelMetricClaimedVerifiedStatus
1VASTCIDEr0.52Unverified
2VALORCIDEr0.42Unverified
3SLAM-AACSPIDEr0.33Unverified
4LOAESPIDEr0.33Unverified
5MQ-CapSPIDEr0.32Unverified
6EnsembleSPIDEr0.32Unverified
7Audio Flamingo (Pengi trainset)SPIDEr0.31Unverified
8Ensemble-RLSPIDEr0.3Unverified
9Qwen-AudioSPIDEr0.29Unverified
10EnsembleSPIDEr0.21Unverified