| Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation | Mar 20, 2021 | Action SegmentationClustering | CodeCode Available | 1 |
| ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation | Mar 19, 2021 | ObjectReferring Expression Segmentation | —Unverified | 0 |
| Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training | Mar 18, 2021 | Video Understanding | —Unverified | 0 |
| PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization | Mar 9, 2021 | Action LocalizationBoundary Detection | —Unverified | 0 |
| Unsupervised Motion Representation Enhanced Network for Action Recognition | Mar 5, 2021 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Win-Fail Action Recognition | Feb 15, 2021 | Action RecognitionAction Understanding | CodeCode Available | 0 |
| Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition | Feb 14, 2021 | Action RecognitionTemporal Action Localization | CodeCode Available | 1 |
| Is Space-Time Attention All You Need for Video Understanding? | Feb 9, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Relaxed Transformer Decoders for Direct Action Proposal Generation | Feb 3, 2021 | Action DetectionTemporal Action Proposal Generation | CodeCode Available | 1 |
| Occluded Video Instance Segmentation: A Benchmark | Feb 2, 2021 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| TCLR: Temporal Contrastive Learning for Video Representation | Jan 20, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| TrackFormer: Multi-Object Tracking with Transformers | Jan 7, 2021 | DecoderMulti-Object Tracking | CodeCode Available | 1 |
| CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization | Jan 1, 2021 | Action LocalizationImitation Learning | —Unverified | 0 |
| Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion | Jan 1, 2021 | Time SeriesTime Series Analysis | —Unverified | 0 |
| Global Self-Attention Networks | Jan 1, 2021 | Video Understanding | —Unverified | 0 |
| Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition | Jan 1, 2021 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization | Jan 1, 2021 | Action LocalizationVideo Understanding | —Unverified | 0 |
| A Comprehensive Study of Deep Video Action Recognition | Dec 11, 2020 | Action RecognitionDeep Learning | CodeCode Available | 1 |
| Understanding Action Sequences based on Video Captioning for Learning-from-Observation | Dec 9, 2020 | Video CaptioningVideo Understanding | —Unverified | 0 |
| End-to-End Video Instance Segmentation with Transformers | Nov 30, 2020 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| t-EVA: Time-Efficient t-SNE Video Annotation | Nov 26, 2020 | Dimensionality ReductionVideo Classification | —Unverified | 0 |
| SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos | Nov 26, 2020 | Action SpottingBoundary Detection | CodeCode Available | 1 |
| Can Temporal Information Help with Contrastive Self-Supervised Learning? | Nov 25, 2020 | Data AugmentationRepresentation Learning | —Unverified | 0 |
| QuerYD: A video dataset with high-quality text and audio narrations | Nov 22, 2020 | RetrievalVideo Understanding | CodeCode Available | 1 |
| Cycle-Contrast for Self-Supervised Video Representation Learning | Oct 28, 2020 | Action RecognitionContrastive Learning | —Unverified | 0 |
| Co-attentional Transformers for Story-Based Video Understanding | Oct 27, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Improved Actor Relation Graph based Group Activity Recognition | Oct 24, 2020 | Activity RecognitionGroup Activity Recognition | CodeCode Available | 1 |
| Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset | Oct 15, 2020 | Activity RecognitionEgocentric Activity Recognition | —Unverified | 0 |
| Video Action Understanding | Oct 13, 2020 | Action UnderstandingDeep Learning | CodeCode Available | 0 |
| Global Self-Attention Networks for Image Recognition | Oct 6, 2020 | Video Understanding | —Unverified | 0 |
| Features Understanding in 3D CNNs for Actions Recognition in Video | Oct 1, 2020 | Action RecognitionDecision Making | CodeCode Available | 0 |
| PAN: Towards Fast Action Recognition via Learning Persistence of Appearance | Aug 8, 2020 | Action RecognitionOptical Flow Estimation | CodeCode Available | 1 |
| Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition | Aug 3, 2020 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020) | Aug 3, 2020 | Natural Language QueriesRetrieval | CodeCode Available | 1 |
| Self-supervised Motion Representation via Scattering Local Motion Cues | Aug 1, 2020 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning | Aug 1, 2020 | Accident AnticipationActivity Prediction | CodeCode Available | 1 |
| Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection | Jul 29, 2020 | object-detectionObject Detection | —Unverified | 0 |
| Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos | Jul 22, 2020 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| MovieNet: A Holistic Dataset for Movie Understanding | Jul 21, 2020 | Video Understanding | —Unverified | 0 |
| MotionSqueeze: Neural Motion Feature Learning for Video Understanding | Jul 20, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training | Jul 5, 2020 | DecoderQuestion Answering | —Unverified | 0 |
| Video Moment Localization using Object Evidence and Reverse Captioning | Jun 18, 2020 | Language-Based Temporal LocalizationLanguage Modelling | CodeCode Available | 1 |
| Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization | Jun 14, 2020 | Action DetectionAction Localization | CodeCode Available | 1 |
| Video Understanding as Machine Translation | Jun 12, 2020 | Machine TranslationMetric Learning | —Unverified | 0 |
| Large Scale Video Representation Learning via Relational Graph Clustering | Jun 1, 2020 | ClusteringGraph Clustering | —Unverified | 0 |
| Screencast Tutorial Video Understanding | Jun 1, 2020 | object-detectionObject Detection | CodeCode Available | 0 |
| Temporal Aggregate Representations for Long-Range Video Understanding | Jun 1, 2020 | Action AnticipationAction Recognition | CodeCode Available | 1 |
| CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction | May 26, 2020 | Autonomous VehiclesPrediction | CodeCode Available | 0 |
| DramaQA: Character-Centered Video Story Understanding with Hierarchical QA | May 7, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning | May 1, 2020 | DiagnosticObject | —Unverified | 0 |