SOTAVerified

Audio-visual Question Answering

Papers

Showing 1120 of 27 papers

TitleStatusHype
Learning Trimodal Relation for AVQA with Missing ModalityCode1
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering0
Towards Multilingual Audio-Visual Question AnsweringCode0
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering0
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question AnsweringCode1
Answering Diverse Questions via Text Attached with Key Audio-Visual CluesCode0
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual ScenariosCode2
Object-aware Adaptive-Positivity Learning for Audio-Visual Question AnsweringCode0
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA0
Progressive Spatio-temporal Perception for Audio-Visual Question AnsweringCode1
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VASTAcc80.7Unverified
2CoQo(Internvideo2)Acc79.6Unverified
3VALORAcc78.9Unverified
4CADAcc78.26Unverified
5LAVISHAcc77.08Unverified
6ST-AVQAAcc71.52Unverified