| Class-Incremental Grouping Network for Continual Audio-Visual Learning | Sep 11, 2023 | audio-visual learningclass-incremental learning | CodeCode Available | 1 |
| CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | May 2, 2025 | audio-visual learningcross-modal alignment | CodeCode Available | 1 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition | May 30, 2023 | audio-visual learning | CodeCode Available | 1 |
| AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models | Sep 19, 2023 | audio-visual learningRepresentation Learning | CodeCode Available | 1 |
| AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | May 22, 2023 | audio-visual learningImage Generation | CodeCode Available | 1 |
| Can audio-visual integration strengthen robustness under multimodal attacks? | Apr 5, 2021 | audio-visual learningVisual Localization | CodeCode Available | 1 |
| Can CLIP Help Sound Source Localization? | Nov 7, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 |
| Cascaded Multilingual Audio-Visual Learning from Videos | Nov 8, 2021 | audio-visual learningRetrieval | CodeCode Available | 1 |
| Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration | Dec 17, 2024 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 |