| Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser | May 27, 2023 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 |
| AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | May 22, 2023 | audio-visual learningImage Generation | CodeCode Available | 1 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| UAVM: Towards Unifying Audio and Visual Models | Jul 29, 2022 | Audio Classificationaudio-visual learning | CodeCode Available | 1 |
| Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection | Jul 12, 2022 | Anomaly Detection In Surveillance Videosaudio-visual learning | CodeCode Available | 1 |
| Learning to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 26, 2022 | audio-visual learningAudio-visual Question Answering | CodeCode Available | 1 |
| Cascaded Multilingual Audio-Visual Learning from Videos | Nov 8, 2021 | audio-visual learningRetrieval | CodeCode Available | 1 |
| Distilling Audio-Visual Knowledge by Compositional Contrastive Learning | Apr 22, 2021 | Audio Taggingaudio-visual learning | CodeCode Available | 1 |
| Can audio-visual integration strengthen robustness under multimodal attacks? | Apr 5, 2021 | audio-visual learningVisual Localization | CodeCode Available | 1 |
| Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework | Jun 9, 2025 | audio-visual learningDeepFake Detection | —Unverified | 0 |