| FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning | Apr 1, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 2 |
| Question-Aware Gaussian Experts for Audio-Visual Question Answering | Mar 6, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering | Nov 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| OMCAT: Omni Context Aware Transformer | Oct 15, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Boosting Audio Visual Question Answering via Key Semantic-Aware Cues | Jul 30, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Learning Trimodal Relation for AVQA with Missing Modality | Jul 23, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering | Jun 14, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Towards Multilingual Audio-Visual Question Answering | Jun 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | May 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering | Apr 18, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Answering Diverse Questions via Text Attached with Key Audio-Visual Clues | Mar 11, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 2 |
| Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering | Dec 20, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| CAD -- Contextual Multi-modal Alignment for Dynamic AVQA | Oct 25, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Progressive Spatio-temporal Perception for Audio-Visual Question Answering | Aug 10, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios | May 21, 2023 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 0 |
| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 |
| VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset | Apr 17, 2023 | Audio captioningAudio-Video Question Answering (AVQA) | CodeCode Available | 2 |
| Learning to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 26, 2022 | audio-visual learningAudio-visual Question Answering | CodeCode Available | 1 |
| Hierarchical Conditional Relation Networks for Video Question Answering | Feb 25, 2020 | Audio-Visual Question Answering (AVQA)Question Answering | CodeCode Available | 1 |