EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
2024-09-02Code Available2· sign in to hype
Jaeyeon Kim, Minjeon Jeon, JaeYoon Jung, Sang Hoon Woo, Jinjoo Lee
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/jaeyeonkim99/enclapOfficialpytorch★ 94
Abstract
In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| AudioCaps | EnCLAP++-large | SPIDEr | 0.51 | — | Unverified |
| AudioCaps | EnCLAP++-base | SPIDEr | 0.5 | — | Unverified |