| Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance | Sep 23, 2024 | Emotion RecognitionFAD | CodeCode Available | 3 |
| FlowDec: A flow-based full-band general audio codec with high perceptual quality | Mar 3, 2025 | FAD | CodeCode Available | 2 |
| KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation | Feb 21, 2025 | Audio GenerationFAD | CodeCode Available | 2 |
| Efficient Autoregressive Audio Modeling via Next-Scale Prediction | Aug 16, 2024 | Audio GenerationFAD | CodeCode Available | 2 |
| L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection | Aug 7, 2024 | 3D Object DetectionAutonomous Navigation | CodeCode Available | 2 |
| Taming Data and Transformers for Audio Generation | Jun 27, 2024 | Audio captioningAudio Generation | CodeCode Available | 2 |
| MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models | Jun 7, 2024 | FADText-to-Music Generation | CodeCode Available | 2 |
| Adapting Frechet Audio Distance for Generative Music Evaluation | Nov 2, 2023 | FAD | CodeCode Available | 2 |
| MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Dec 19, 2022 | cross-modal alignmentDenoising | CodeCode Available | 2 |
| BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation | Jun 11, 2025 | Audio GenerationFAD | CodeCode Available | 1 |