| Open-Vocabulary Video Relation Extraction | Dec 25, 2023 | Action ClassificationAction Understanding | CodeCode Available | 1 |
| No More Shortcuts: Realizing the Potential of Temporal Self-Supervision | Dec 20, 2023 | Action ClassificationAttribute | —Unverified | 0 |
| ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room | Dec 19, 2023 | Action ClassificationActivity Recognition | —Unverified | 0 |
| CAST: Cross-Attention in Space and Time for Video Action Recognition | Nov 30, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living | Nov 30, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | Nov 28, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning | Nov 27, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization | Nov 27, 2023 | Action ClassificationAction Detection | —Unverified | 0 |
| Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities | Nov 9, 2023 | Action ClassificationAudio Classification | —Unverified | 0 |
| OmniVec: Learning robust representations with cross modal sharing | Nov 7, 2023 | 3D Point Cloud ClassificationAction Classification | —Unverified | 0 |