| Open-Vocabulary Video Relation Extraction | Dec 25, 2023 | Action ClassificationAction Understanding | CodeCode Available | 1 |
| No More Shortcuts: Realizing the Potential of Temporal Self-Supervision | Dec 20, 2023 | Action ClassificationAttribute | —Unverified | 0 |
| ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room | Dec 19, 2023 | Action ClassificationActivity Recognition | —Unverified | 0 |
| Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living | Nov 30, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| CAST: Cross-Attention in Space and Time for Video Action Recognition | Nov 30, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | Nov 28, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning | Nov 27, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization | Nov 27, 2023 | Action ClassificationAction Detection | —Unverified | 0 |
| Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities | Nov 9, 2023 | Action ClassificationAudio Classification | —Unverified | 0 |
| OmniVec: Learning robust representations with cross modal sharing | Nov 7, 2023 | 3D Point Cloud ClassificationAction Classification | —Unverified | 0 |
| Asymmetric Masked Distillation for Pre-Training Small Foundation Models | Nov 6, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| After-Stroke Arm Paresis Detection using Kinematic Data | Nov 3, 2023 | Action ClassificationKnowledge Distillation | —Unverified | 0 |
| Proposal-based Temporal Action Localization with Point-level Supervision | Oct 9, 2023 | Action ClassificationAction Localization | —Unverified | 0 |
| ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video | Oct 2, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| SkeleTR: Towrads Skeleton-based Action Recognition in the Wild | Sep 20, 2023 | Action ClassificationAction Detection | —Unverified | 0 |
| MOFO: MOtion FOcused Self-Supervision for Video Understanding | Aug 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Progression-Guided Temporal Action Detection in Videos | Aug 18, 2023 | Action ClassificationAction Detection | CodeCode Available | 0 |
| ALIP: Adaptive Language-Image Pre-training with Synthetic Caption | Aug 16, 2023 | Action ClassificationImage-text Retrieval | CodeCode Available | 1 |
| Temporally-Adaptive Models for Efficient Video Understanding | Aug 10, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification | Jul 20, 2023 | Action ClassificationClassification | CodeCode Available | 0 |
| Actor-agnostic Multi-label Action Recognition with Multi-modal Query | Jul 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| What Can Simple Arithmetic Operations Do for Temporal Modeling? | Jul 18, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Semi Supervised Meta Learning for Spatiotemporal Learning | Jul 9, 2023 | Action ClassificationClassification | —Unverified | 0 |
| Spiking Two-Stream Methods with Unsupervised STDP-based Learning for Action Recognition | Jun 23, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers | Jun 15, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks | Jun 9, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| Human Action Recognition in Egocentric Perspective Using 2D Object and Hands Pose | Jun 8, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| HomE: Homography-Equivariant Video Representation Learning | Jun 2, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Jun 1, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities | May 18, 2023 | 1 Image, 2*2 StitchiAction Classification | CodeCode Available | 3 |
| Self-Supervised Video Representation Learning via Latent Time Navigation | May 10, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation | Apr 24, 2023 | 3D Hand Pose EstimationAction Classification | CodeCode Available | 1 |
| Implicit Temporal Modeling with Learnable Alignment for Video Recognition | Apr 20, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| VicTR: Video-conditioned Text Representations for Activity Recognition | Apr 5, 2023 | Action ClassificationActivity Recognition | —Unverified | 0 |
| VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | Mar 29, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Unmasked Teacher: Towards Training-Efficient Video Foundation Models | Mar 28, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| The effectiveness of MAE pre-pretraining for billion-scale pretraining | Mar 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Multi-modal Prompting for Low-Shot Temporal Action Localization | Mar 21, 2023 | Action ClassificationAction Localization | —Unverified | 0 |
| ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders | Mar 21, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Dual-path Adaptation from Image to Video Transformers | Mar 17, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Classification of Primitive Manufacturing Tasks from Filtered Event Data | Mar 15, 2023 | Action ClassificationClassification | —Unverified | 0 |
| Scaling Vision Transformers to 22 Billion Parameters | Feb 10, 2023 | Action ClassificationFairness | CodeCode Available | 0 |
| Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks | Feb 6, 2023 | Action ClassificationAction Detection | CodeCode Available | 0 |
| AIM: Adapting Image Models for Efficient Video Action Recognition | Feb 6, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms | Feb 6, 2023 | Action ClassificationAction Detection | CodeCode Available | 0 |
| Deep Dependency Networks for Multi-Label Classification | Feb 1, 2023 | Action ClassificationClassification | —Unverified | 0 |
| mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video | Feb 1, 2023 | Action ClassificationImage Classification | CodeCode Available | 4 |
| Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework | Jan 10, 2023 | Action ClassificationDecision Making | —Unverified | 0 |
| HierVL: Learning Hierarchical Video-Language Embeddings | Jan 5, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded | Jan 1, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |