| Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Jun 1, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Self-Supervised Video Representation Learning via Latent Time Navigation | May 10, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| VicTR: Video-conditioned Text Representations for Activity Recognition | Apr 5, 2023 | Action ClassificationActivity Recognition | —Unverified | 0 |
| Unmasked Teacher: Towards Training-Efficient Video Foundation Models | Mar 28, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Multi-modal Prompting for Low-Shot Temporal Action Localization | Mar 21, 2023 | Action ClassificationAction Localization | —Unverified | 0 |
| ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders | Mar 21, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Classification of Primitive Manufacturing Tasks from Filtered Event Data | Mar 15, 2023 | Action ClassificationClassification | —Unverified | 0 |
| Scaling Vision Transformers to 22 Billion Parameters | Feb 10, 2023 | Action ClassificationFairness | CodeCode Available | 0 |
| Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks | Feb 6, 2023 | Action ClassificationAction Detection | CodeCode Available | 0 |
| Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms | Feb 6, 2023 | Action ClassificationAction Detection | CodeCode Available | 0 |