| Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning | Sep 22, 2019 | Sound Source Localization | —Unverified | 0 |
| Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks | Oct 27, 2017 | Sound Source Localization | —Unverified | 0 |
| Sound Source Localization is All about Cross-Modal Alignment | Sep 19, 2023 | Allcross-modal alignment | —Unverified | 0 |
| Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment | Mar 30, 2023 | Scene GenerationScheduling | —Unverified | 0 |
| SVD-PHAT: A Fast Sound Source Localization Method | Feb 11, 2019 | Sound Source Localization | —Unverified | 0 |
| Text-Queried Target Sound Event Localization | Jun 23, 2024 | Room Impulse Response (RIR)Sound Event Localization and Detection | —Unverified | 0 |
| TF-Mamba: A Time-Frequency Network for Sound Source Localization | Sep 8, 2024 | MambaSound Source Localization | —Unverified | 0 |
| The trajectoRIR Database: Room Acoustic Recordings Along a Trajectory of Moving Microphones | Mar 29, 2025 | Sound Source Localization | —Unverified | 0 |
| VAST : The Virtual Acoustic Space Traveler Dataset | Dec 14, 2016 | Sound Source Localization | —Unverified | 0 |
| Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning | Aug 13, 2020 | Action RecognitionAudio-Visual Synchronization | —Unverified | 0 |