| Inverse Compositional Learning for Weakly-supervised Relation Grounding | Jan 1, 2023 | RelationVideo Understanding | —Unverified | 0 |
| Self-Supervised Object Detection from Egocentric Videos | Jan 1, 2023 | Class-agnostic Object DetectionObject | —Unverified | 0 |
| Relational Space-Time Query in Long-Form Videos | Jan 1, 2023 | FormVideo Understanding | —Unverified | 0 |
| Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning | Jan 1, 2023 | Contrastive LearningRepresentation Learning | CodeCode Available | 1 |
| Few-Shot Referring Relationships in Videos | Jan 1, 2023 | ObjectRelation Network | CodeCode Available | 0 |
| Joint Engagement Classification using Video Augmentation Techniques for Multi-person Human-robot Interaction | Dec 28, 2022 | Data AugmentationFace Swapping | —Unverified | 0 |
| Inductive Attention for Video Action Anticipation | Dec 17, 2022 | Action AnticipationAction Recognition | —Unverified | 0 |
| Towards Smooth Video Composition | Dec 14, 2022 | Image Generationsingle-image-generation | CodeCode Available | 1 |
| Egocentric Video Task Translation | Dec 13, 2022 | Multi-Task LearningTranslation | —Unverified | 0 |
| Contextual Explainable Video Representation: Human Perception-based Understanding | Dec 12, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 |
| PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data | Dec 8, 2022 | Action RecognitionPrompt Learning | —Unverified | 0 |
| Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images | Dec 7, 2022 | Building change detection for remote sensing imagesChange Detection | —Unverified | 0 |
| InternVideo: General Video Foundation Models via Generative and Discriminative Learning | Dec 6, 2022 | Action ClassificationAction Recognition | CodeCode Available | 4 |
| Spatio-Temporal Crop Aggregation for Video Representation Learning | Nov 30, 2022 | Action ClassificationDimensionality Reduction | —Unverified | 0 |
| MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing | Nov 28, 2022 | Activity RecognitionFew Shot Action Recognition | CodeCode Available | 1 |
| Dynamic Appearance: A Video Representation for Action Recognition with Joint Training | Nov 23, 2022 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| Contrastive Masked Autoencoders for Self-Supervised Video Hashing | Nov 21, 2022 | DecoderRetrieval | CodeCode Available | 1 |
| A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset | Nov 19, 2022 | Common Sense ReasoningGraph Embedding | —Unverified | 0 |
| EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens | Nov 19, 2022 | Action RecognitionObject State Change Classification | CodeCode Available | 1 |
| Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022 | Nov 18, 2022 | Object State Change ClassificationTemporal Localization | CodeCode Available | 0 |
| InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges | Nov 17, 2022 | Future Hand PredictionMoment Queries | CodeCode Available | 1 |
| UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Nov 17, 2022 | Video Understanding | CodeCode Available | 2 |
| Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022 | Nov 16, 2022 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| Grounded Video Situation Recognition | Oct 19, 2022 | DescriptiveStructured Prediction | —Unverified | 0 |
| VTC: Improving Video-Text Retrieval with User Comments | Oct 19, 2022 | Representation LearningRetrieval | CodeCode Available | 1 |