| GLAP: General contrastive audio-text pretraining across domains and languages | Jun 12, 2025 | AudioCapsKeyword Spotting | CodeCode Available | 2 |
| AC/DC: LLM-based Audio Comprehension via Dialogue Continuation | Jun 12, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling | May 31, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning | May 28, 2025 | AudioCapsAudio captioning | —Unverified | 0 |
| AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion | May 28, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap | Mar 15, 2025 | AudioCapsAudio Generation | —Unverified | 0 |
| Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model | Mar 12, 2025 | AudioCapsContrastive Learning | —Unverified | 0 |
| TAIL: Text-Audio Incremental Learning | Mar 6, 2025 | AudioCapsIncremental Learning | —Unverified | 0 |
| ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors | Feb 20, 2025 | AudioCapsContrastive Learning | CodeCode Available | 0 |
| Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning | Feb 8, 2025 | AudioCapsAudio captioning | —Unverified | 0 |