| Self-Supervised Video Representation Learning via Latent Time Navigation | May 10, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation | Apr 24, 2023 | 3D Hand Pose EstimationAction Classification | CodeCode Available | 1 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| VicTR: Video-conditioned Text Representations for Activity Recognition | Apr 5, 2023 | Action ClassificationActivity Recognition | —Unverified | 0 |
| VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | Mar 29, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Unmasked Teacher: Towards Training-Efficient Video Foundation Models | Mar 28, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| The effectiveness of MAE pre-pretraining for billion-scale pretraining | Mar 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Multi-modal Prompting for Low-Shot Temporal Action Localization | Mar 21, 2023 | Action ClassificationAction Localization | —Unverified | 0 |
| ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders | Mar 21, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Dual-path Adaptation from Image to Video Transformers | Mar 17, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |