SOTAVerified

Audio-visual Question Answering

Papers

Showing 110 of 27 papers

TitleStatusHype
Learning Sparsity for Effective and Efficient Music Performance Question Answering0
Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMsCode0
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal ReasoningCode2
PAVE: Patching and Adapting Video Large Language ModelsCode1
Question-Aware Gaussian Experts for Audio-Visual Question AnsweringCode1
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual LearningCode0
Patch-level Sounding Object Tracking for Audio-Visual Question Answering0
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering0
OMCAT: Omni Context Aware Transformer0
Boosting Audio Visual Question Answering via Key Semantic-Aware CuesCode1
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VASTAcc80.7Unverified
2CoQo(Internvideo2)Acc79.6Unverified
3VALORAcc78.9Unverified
4CADAcc78.26Unverified
5LAVISHAcc77.08Unverified
6ST-AVQAAcc71.52Unverified