| Can audio-visual integration strengthen robustness under multimodal attacks? | Apr 5, 2021 | audio-visual learningVisual Localization | CodeCode Available | 1 | 5 |
| Language-Guided Audio-Visual Learning for Long-Term Sports Assessment | Jan 1, 2025 | audio-visual learningKnowledge Graphs | CodeCode Available | 1 | 5 |
| A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition | May 30, 2023 | audio-visual learning | CodeCode Available | 1 | 5 |
| Learning to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 26, 2022 | audio-visual learningAudio-visual Question Answering | CodeCode Available | 1 | 5 |
| AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models | Sep 19, 2023 | audio-visual learningRepresentation Learning | CodeCode Available | 1 | 5 |
| Can CLIP Help Sound Source Localization? | Nov 7, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 | 5 |
| Cascaded Multilingual Audio-Visual Learning from Videos | Nov 8, 2021 | audio-visual learningRetrieval | CodeCode Available | 1 | 5 |
| CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | May 2, 2025 | audio-visual learningcross-modal alignment | CodeCode Available | 1 | 5 |
| Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection | Jul 12, 2022 | Anomaly Detection In Surveillance Videosaudio-visual learning | CodeCode Available | 1 | 5 |
| Class-Incremental Grouping Network for Continual Audio-Visual Learning | Sep 11, 2023 | audio-visual learningclass-incremental learning | CodeCode Available | 1 | 5 |
| AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation | May 22, 2023 | audio-visual learningImage Generation | CodeCode Available | 1 | 5 |
| UAVM: Towards Unifying Audio and Visual Models | Jul 29, 2022 | Audio Classificationaudio-visual learning | CodeCode Available | 1 | 5 |
| Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser | May 27, 2023 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 | 5 |
| Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation | Apr 6, 2023 | audio-visual learningContrastive Learning | CodeCode Available | 1 | 5 |
| Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration | Dec 17, 2024 | audio-visual event localizationaudio-visual learning | CodeCode Available | 1 | 5 |
| Distilling Audio-Visual Knowledge by Compositional Contrastive Learning | Apr 22, 2021 | Audio Taggingaudio-visual learning | CodeCode Available | 1 | 5 |
| Towards Emotion Analysis in Short-form Videos: A Large-Scale Dataset and Baseline | Nov 29, 2023 | audio-visual learningForm | CodeCode Available | 1 | 5 |
| Enhancing Sound Source Localization via False Negative Elimination | Aug 29, 2024 | audio-visual learningContrastive Learning | CodeCode Available | 1 | 5 |
| EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning | Mar 14, 2024 | Audio Classificationaudio-visual learning | CodeCode Available | 1 | 5 |
| Deep Video Inpainting Guided by Audio-Visual Self-Supervision | Oct 11, 2023 | audio-visual learningVideo Inpainting | CodeCode Available | 0 | 5 |
| MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers | Jun 7, 2024 | audio-visual learningContrastive Learning | CodeCode Available | 0 | 5 |
| Revisiting Pre-training in Audio-Visual Learning | Feb 7, 2023 | audio-visual learning | CodeCode Available | 0 | 5 |
| Boosting Audio-visual Zero-shot Learning with Large Language Models | Nov 21, 2023 | audio-visual learningDescriptive | CodeCode Available | 0 | 5 |
| Adversarial-Metric Learning for Audio-Visual Cross-Modal Matching | Jan 12, 2021 | audio-visual learningMetric Learning | CodeCode Available | 0 | 5 |
| Versatile audio-visual learning for emotion recognition | May 12, 2023 | Arousal EstimationAttribute | —Unverified | 0 | 0 |
| Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA | Aug 10, 2019 | audio-visual learningRetrieval | —Unverified | 0 | 0 |
| Deep Audio-Visual Learning: A Survey | Jan 14, 2020 | audio-visual learningRepresentation Learning | —Unverified | 0 | 0 |
| Few-Shot Audio-Visual Learning of Environment Acoustics | Jun 8, 2022 | audio-visual learningRoom Impulse Response (RIR) | —Unverified | 0 | 0 |
| Learning in Audio-visual Context: A Review, Analysis, and New Perspective | Aug 20, 2022 | audio-visual learningScene Understanding | —Unverified | 0 | 0 |
| Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning | Sep 8, 2023 | audio-visual learningQuantization | —Unverified | 0 | 0 |
| Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework | Jun 9, 2025 | audio-visual learningDeepFake Detection | —Unverified | 0 | 0 |
| Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization | Jan 16, 2024 | Action DetectionActivity Detection | —Unverified | 0 | 0 |
| Object Segmentation with Audio Context | Jan 4, 2023 | audio-visual learningDecoder | —Unverified | 0 | 0 |
| RealImpact: A Dataset of Impact Sound Fields for Real Objects | Jun 16, 2023 | audio-visual learning | —Unverified | 0 | 0 |
| Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives | Feb 17, 2025 | Adversarial Robustnessaudio-visual learning | —Unverified | 0 | 0 |
| Sequential Contrastive Audio-Visual Learning | Jul 8, 2024 | audio-visual learningContrastive Learning | —Unverified | 0 | 0 |
| Telling Left from Right: Learning Spatial Correspondence of Sight and Sound | Jun 11, 2020 | audio-visual learning | —Unverified | 0 | 0 |
| Unveiling Visual Biases in Audio-Visual Localization Benchmarks | Aug 25, 2024 | audio-visual learningVisual Localization | —Unverified | 0 | 0 |