| CVNets: High Performance Library for Computer Vision | Jun 4, 2022 | Video UnderstandingVocal Bursts Intensity Prediction | CodeCode Available | 6 |
| Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding | Jun 1, 2022 | Knowledge GraphsVideo Understanding | —Unverified | 0 |
| From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering | May 30, 2022 | counterfactualDescriptive | CodeCode Available | 1 |
| Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions | May 19, 2022 | Contrastive LearningSelf-Supervised Learning | CodeCode Available | 1 |
| ETAD: Training Action Detection End to End on a Laptop | May 14, 2022 | Action DetectionGPU | CodeCode Available | 1 |
| BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection | May 5, 2022 | Action Detectionobject-detection | CodeCode Available | 1 |
| i-Code: An Integrative and Composable Multimodal Learning Framework | May 3, 2022 | Contrastive LearningVideo Understanding | —Unverified | 0 |
| Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering | May 1, 2022 | Question AnsweringVideo Classification | —Unverified | 0 |
| Flamingo: a Visual Language Model for Few-Shot Learning | Apr 29, 2022 | Few-Shot LearningGenerative Visual Question Answering | CodeCode Available | 4 |
| Causal Reasoning Meets Visual Representation Learning: A Prospective Study | Apr 26, 2022 | BenchmarkingOut-of-Distribution Generalization | —Unverified | 0 |
| Contrastive Language-Action Pre-training for Temporal Localization | Apr 26, 2022 | Action LocalizationContrastive Learning | —Unverified | 0 |
| Revealing Occlusions with 4D Neural Fields | Apr 22, 2022 | Video Understanding | —Unverified | 0 |
| A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions | Apr 21, 2022 | Action DetectionVideo Understanding | CodeCode Available | 1 |
| Less than Few: Self-Shot Video Instance Segmentation | Apr 19, 2022 | Few-Shot LearningInstance Segmentation | —Unverified | 0 |
| ActAR: Actor-Driven Pose Embeddings for Video Action Recognition | Apr 19, 2022 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems | Apr 7, 2022 | Anomaly DetectionBIG-bench Machine Learning | —Unverified | 0 |
| MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization | Apr 6, 2022 | Action LocalizationAction Recognition | —Unverified | 0 |
| Temporal Alignment Networks for Long-term Video | Apr 6, 2022 | Action RecognitionAction Segmentation | CodeCode Available | 1 |
| An Empirical Study of End-to-End Temporal Action Detection | Apr 6, 2022 | Action ClassificationAction Detection | CodeCode Available | 1 |
| Long Movie Clip Classification with State-Space Video Models | Apr 4, 2022 | ClassificationDecoder | CodeCode Available | 1 |
| PYSKL: a toolbox for skeleton-based video understanding | Apr 2, 2022 | Skeleton Based Action RecognitionVideo Understanding | —Unverified | 0 |
| SPAct: Self-supervised Privacy Preservation for Action Recognition | Mar 29, 2022 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning? | Mar 27, 2022 | Self-Supervised LearningSensitivity | CodeCode Available | 1 |
| FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks | Mar 24, 2022 | Action RecognitionRetrieval | CodeCode Available | 0 |
| VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | Mar 23, 2022 | 4kAction Classification | CodeCode Available | 3 |