| Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation | Mar 20, 2021 | Action SegmentationClustering | CodeCode Available | 1 |
| ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation | Mar 19, 2021 | ObjectReferring Expression Segmentation | —Unverified | 0 |
| Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training | Mar 18, 2021 | Video Understanding | —Unverified | 0 |
| PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization | Mar 9, 2021 | Action LocalizationBoundary Detection | —Unverified | 0 |
| Unsupervised Motion Representation Enhanced Network for Action Recognition | Mar 5, 2021 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Win-Fail Action Recognition | Feb 15, 2021 | Action RecognitionAction Understanding | CodeCode Available | 0 |
| Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition | Feb 14, 2021 | Action RecognitionTemporal Action Localization | CodeCode Available | 1 |
| Is Space-Time Attention All You Need for Video Understanding? | Feb 9, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Relaxed Transformer Decoders for Direct Action Proposal Generation | Feb 3, 2021 | Action DetectionTemporal Action Proposal Generation | CodeCode Available | 1 |
| Occluded Video Instance Segmentation: A Benchmark | Feb 2, 2021 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| TCLR: Temporal Contrastive Learning for Video Representation | Jan 20, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| TrackFormer: Multi-Object Tracking with Transformers | Jan 7, 2021 | DecoderMulti-Object Tracking | CodeCode Available | 1 |
| CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization | Jan 1, 2021 | Action LocalizationImitation Learning | —Unverified | 0 |
| Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion | Jan 1, 2021 | Time SeriesTime Series Analysis | —Unverified | 0 |
| Global Self-Attention Networks | Jan 1, 2021 | Video Understanding | —Unverified | 0 |
| Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition | Jan 1, 2021 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization | Jan 1, 2021 | Action LocalizationVideo Understanding | —Unverified | 0 |
| A Comprehensive Study of Deep Video Action Recognition | Dec 11, 2020 | Action RecognitionDeep Learning | CodeCode Available | 1 |
| Understanding Action Sequences based on Video Captioning for Learning-from-Observation | Dec 9, 2020 | Video CaptioningVideo Understanding | —Unverified | 0 |
| End-to-End Video Instance Segmentation with Transformers | Nov 30, 2020 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| t-EVA: Time-Efficient t-SNE Video Annotation | Nov 26, 2020 | Dimensionality ReductionVideo Classification | —Unverified | 0 |
| SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos | Nov 26, 2020 | Action SpottingBoundary Detection | CodeCode Available | 1 |
| Can Temporal Information Help with Contrastive Self-Supervised Learning? | Nov 25, 2020 | Data AugmentationRepresentation Learning | —Unverified | 0 |
| QuerYD: A video dataset with high-quality text and audio narrations | Nov 22, 2020 | RetrievalVideo Understanding | CodeCode Available | 1 |
| Cycle-Contrast for Self-Supervised Video Representation Learning | Oct 28, 2020 | Action RecognitionContrastive Learning | —Unverified | 0 |