| Vision Transformers are Parameter-Efficient Audio-Visual Learners | Dec 15, 2022 | Audio-visual Question AnsweringAUDIO-VISUAL QUESTION ANSWERING (MUSIC-AVQA-v2.0) | CodeCode Available | 1 |
| Learning to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 26, 2022 | audio-visual learningAudio-visual Question Answering | CodeCode Available | 1 |
| Pano-AVQA: Grounded Audio-Visual Question Answering on 360^ Videos | Oct 11, 2021 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos | Jan 1, 2021 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Learning Sparsity for Effective and Efficient Music Performance Question Answering | Jun 2, 2025 | Audio-visual Question AnsweringQuestion Answering | —Unverified | 0 |
| Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs | May 27, 2025 | Audio-visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning | Jan 1, 2025 | Audio-visual Question AnsweringContinual Learning | CodeCode Available | 0 |
| Patch-level Sounding Object Tracking for Audio-Visual Question Answering | Dec 14, 2024 | Audio-visual Question AnsweringObject Tracking | —Unverified | 0 |
| SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering | Nov 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| OMCAT: Omni Context Aware Transformer | Oct 15, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |