SOTAVerified

Audio-Visual Question Answering (AVQA)

Papers

Showing 110 of 20 papers

TitleStatusHype
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal ReasoningCode2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and DatasetCode2
Learning Trimodal Relation for AVQA with Missing ModalityCode1
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question AnsweringCode1
Progressive Spatio-temporal Perception for Audio-Visual Question AnsweringCode1
Boosting Audio Visual Question Answering via Key Semantic-Aware CuesCode1
Question-Aware Gaussian Experts for Audio-Visual Question AnsweringCode1
Hierarchical Conditional Relation Networks for Video Question AnsweringCode1
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.