| MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound | Jan 7, 2022 | Action ClassificationNavigate | —Unverified | 0 |
| Improving Video Model Transfer With Dynamic Representation Learning | Jan 1, 2022 | Action ClassificationKnowledge Distillation | —Unverified | 0 |
| Spatio-Temporal CNN baseline method for the Sports Video Task of MediaEval 2021 benchmark | Dec 16, 2021 | Action ClassificationAction Detection | CodeCode Available | 0 |
| Masked Feature Prediction for Self-Supervised Visual Pre-Training | Dec 16, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Co-training Transformer with Videos and Images Improves Action Recognition | Dec 14, 2021 | Action ClassificationAction Recognition | —Unverified | 0 |
| MViTv2: Improved Multiscale Vision Transformers for Classification and Detection | Dec 2, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Self-supervised Video Transformer | Dec 2, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| PreViTS: Contrastive Pretraining with Video Tracking Supervision | Dec 1, 2021 | Action ClassificationSelf-Supervised Learning | —Unverified | 0 |
| Low-Fidelity Video Encoder Optimization for Temporal Action Localization | Dec 1, 2021 | Action ClassificationAction Localization | —Unverified | 0 |
| Reformulating Zero-shot Action Recognition for Multi-label Actions | Dec 1, 2021 | Action ClassificationAction Detection | —Unverified | 0 |