Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

2022-07-04Code Available2· sign in to hype

Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/showlab/egovlp
OfficialIn paperpytorch★ 258

Abstract

In this report, we propose a video-language pretraining (VLP) based solution kevin2022egovlp for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset grauman2021ego4d to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation or video-only representation to several video downstream tasks. Our Egocentric VLP achieves 10.46R@1&IoU @0.3 on NLQ, 10.33 mAP on MQ, 74% Acc on OSCC, 0.67 sec error on PNR. The code is available at https://github.com/showlab/EgoVLP.

Tasks

Language Modeling Language Modelling Object State Change Classification

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

Code

Abstract

Tasks

Reproductions