InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2022-11-17Code Available1· sign in to hype

Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/opengvlab/ego4d-eccv2022-solutions
OfficialIn paperpytorch★ 133
github.com/jonnys1226/ego4d_asl
pytorch★ 9

Abstract

In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions

Tasks

Future Hand Prediction Moment Queries Natural Language Queries Object object-detection Object Detection Short-term Object Interaction Anticipation State Change Object Detection Video Understanding

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Ego4D	InternVideo	Disp(Total)	196.8	—	Unverified

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Code

Abstract

Tasks

Benchmark Results

Reproductions