SOTAVerified

TGIF-Frame

Papers

Showing 1115 of 15 papers

TitleStatusHype
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingCode1
Lightweight Recurrent Cross-modal Encoder for Video Question AnsweringCode0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training0
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.