Audiovisual Masked Autoencoders

2022-12-09ICCV 2023Code Available0· sign in to hype

Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

Code Available — Be the first to reproduce this paper.

Code

github.com/google-research/scenic/tree/main/scenic/projects/av_mae
Officialjax★ 0

Abstract

Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and objectives within the masked autoencoding framework, motivated by the success of similar methods in natural language and image understanding. We show that we can achieve significant improvements on audiovisual downstream classification tasks, surpassing the state-of-the-art on VGGSound and AudioSet. Furthermore, we can leverage our audiovisual pretraining scheme for multiple unimodal downstream tasks using a single audiovisual pretrained model. We additionally demonstrate the transferability of our representations, achieving state-of-the-art audiovisual results on Epic Kitchens without pretraining specifically for this dataset.

Tasks

Audio Classification Representation Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
AudioSet	Audiovisual Masked Autoencoder (Audiovisual, Single)	Test mAP	0.52	—	Unverified
AudioSet	Audiovisual Masked Autoencoder (Audio-only, Single)	Test mAP	0.47	—	Unverified
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audio-only, Single)	Top-1 Action	19.7	—	Unverified
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audiovisual, Single)	Top-1 Action	46	—	Unverified
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Video-only, Single)	Top-1 Action	45.8	—	Unverified
VGGSound	Audiovisual Masked Autoencoder (Audio-only, Single)	Top 1 Accuracy	57.2	—	Unverified
VGGSound	Audiovisual Masked Autoencoder (Audiovisual, Single)	Top 1 Accuracy	65	—	Unverified

Audiovisual Masked Autoencoders

Code

Abstract

Tasks

Benchmark Results

Reproductions