| AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | May 22, 2023 | audio-visual learningImage Generation | CodeCode Available | 1 | 5 |
| UAVM: Towards Unifying Audio and Visual Models | Jul 29, 2022 | Audio Classificationaudio-visual learning | CodeCode Available | 1 | 5 |
| Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser | May 27, 2023 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 | 5 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 | 5 |
| Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration | Dec 17, 2024 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 | 5 |
| Distilling Audio-Visual Knowledge by Compositional Contrastive Learning | Apr 22, 2021 | Audio Taggingaudio-visual learning | CodeCode Available | 1 | 5 |
| Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline | Nov 29, 2023 | audio-visual learningForm | CodeCode Available | 1 | 5 |
| Enhancing Sound Source Localization via False Negative Elimination | Aug 29, 2024 | audio-visual learningContrastive Learning | CodeCode Available | 1 | 5 |
| EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning | Mar 14, 2024 | Audio Classificationaudio-visual learning | CodeCode Available | 1 | 5 |
| Deep Video Inpainting Guided by Audio-Visual Self-Supervision | Oct 11, 2023 | audio-visual learningVideo Inpainting | CodeCode Available | 0 | 5 |