| Learning the Predictability of the Future | Jun 19, 2021 | Representation LearningSelf-Supervised Action Recognition | CodeCode Available | 1 |
| NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions | Jun 19, 2021 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| End-to-end Temporal Action Detection with Transformer | Jun 18, 2021 | Action DetectionTemporal Action Localization | CodeCode Available | 1 |
| Isolated Sign Recognition from RGB Video using Pose Flow and Self-Attention | Jun 11, 2021 | Action RecognitionSign Language Recognition | CodeCode Available | 1 |
| VT-SSum: A Benchmark Dataset for Video Transcript Segmentation and Summarization | Jun 10, 2021 | ArticlesSegmentation | CodeCode Available | 1 |
| Technical Report: Temporal Aggregate Representations | Jun 6, 2021 | Action AnticipationAction Recognition | CodeCode Available | 1 |
| FineAction: A Fine-Grained Video Dataset for Temporal Action Localization | May 24, 2021 | Action DetectionAction Localization | CodeCode Available | 1 |
| NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions | May 18, 2021 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions | May 16, 2021 | Action DetectionAction Localization | CodeCode Available | 1 |
| Stochastic Image-to-Video Synthesis using cINNs | May 10, 2021 | DiversityVideo Understanding | CodeCode Available | 1 |
| FrameExit: Conditional Early Exiting for Efficient Video Recognition | Apr 27, 2021 | Video RecognitionVideo Understanding | CodeCode Available | 1 |
| CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval | Apr 18, 2021 | RetrievalText Retrieval | CodeCode Available | 1 |
| Crossover Learning for Fast Online Video Instance Segmentation | Apr 13, 2021 | Instance SegmentationSemantic Segmentation | CodeCode Available | 1 |
| TubeR: Tubelet Transformer for Video Action Detection | Apr 2, 2021 | Action ClassificationAction Detection | CodeCode Available | 1 |
| Visual Semantic Role Labeling for Video Understanding | Apr 2, 2021 | Semantic Role LabelingVideo Recognition | CodeCode Available | 1 |
| Learning Salient Boundary Feature for Anchor-free Temporal Action Localization | Mar 24, 2021 | Action LocalizationTemporal Action Localization | CodeCode Available | 1 |
| Temporal Context Aggregation Network for Temporal Action Proposal Refinement | Mar 24, 2021 | Action DetectionAction Localization | CodeCode Available | 1 |
| Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation | Mar 20, 2021 | Action SegmentationClustering | CodeCode Available | 1 |
| Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition | Feb 14, 2021 | Action RecognitionTemporal Action Localization | CodeCode Available | 1 |
| Relaxed Transformer Decoders for Direct Action Proposal Generation | Feb 3, 2021 | Action DetectionTemporal Action Proposal Generation | CodeCode Available | 1 |
| Occluded Video Instance Segmentation: A Benchmark | Feb 2, 2021 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| TCLR: Temporal Contrastive Learning for Video Representation | Jan 20, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| TrackFormer: Multi-Object Tracking with Transformers | Jan 7, 2021 | DecoderMulti-Object Tracking | CodeCode Available | 1 |
| Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition | Jan 1, 2021 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| A Comprehensive Study of Deep Video Action Recognition | Dec 11, 2020 | Action RecognitionDeep Learning | CodeCode Available | 1 |
| End-to-End Video Instance Segmentation with Transformers | Nov 30, 2020 | Instance SegmentationSegmentation | CodeCode Available | 1 |
| SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos | Nov 26, 2020 | Action SpottingBoundary Detection | CodeCode Available | 1 |
| QuerYD: A video dataset with high-quality text and audio narrations | Nov 22, 2020 | RetrievalVideo Understanding | CodeCode Available | 1 |
| Improved Actor Relation Graph based Group Activity Recognition | Oct 24, 2020 | Activity RecognitionGroup Activity Recognition | CodeCode Available | 1 |
| PAN: Towards Fast Action Recognition via Learning Persistence of Appearance | Aug 8, 2020 | Action RecognitionOptical Flow Estimation | CodeCode Available | 1 |
| The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020) | Aug 3, 2020 | Natural Language QueriesRetrieval | CodeCode Available | 1 |
| Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning | Aug 1, 2020 | Accident AnticipationActivity Prediction | CodeCode Available | 1 |
| MotionSqueeze: Neural Motion Feature Learning for Video Understanding | Jul 20, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Video Moment Localization using Object Evidence and Reverse Captioning | Jun 18, 2020 | Language-Based Temporal LocalizationLanguage Modelling | CodeCode Available | 1 |
| Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization | Jun 14, 2020 | Action DetectionAction Localization | CodeCode Available | 1 |
| Temporal Aggregate Representations for Long-Range Video Understanding | Jun 1, 2020 | Action AnticipationAction Recognition | CodeCode Available | 1 |
| Towards Visually Explaining Video Understanding Networks with Perturbation | May 1, 2020 | Video Understanding | CodeCode Available | 1 |
| Top-1 Solution of Multi-Moments in Time Challenge 2019 | Mar 12, 2020 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning | Mar 11, 2020 | Question AnsweringVideo Captioning | CodeCode Available | 1 |
| Weakly Supervised Temporal Action Localization Using Deep Metric Learning | Jan 21, 2020 | Action LocalizationMetric Learning | CodeCode Available | 1 |
| Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video | Jan 18, 2020 | Decision Makingreinforcement-learning | CodeCode Available | 1 |
| Temporal Interlacing Network | Jan 17, 2020 | Optical Flow EstimationVideo Understanding | CodeCode Available | 1 |
| EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video | Jan 15, 2020 | DiversityRecommendation Systems | CodeCode Available | 1 |
| A Multigrid Method for Efficiently Training Video Models | Dec 2, 2019 | Action DetectionAction Recognition | CodeCode Available | 1 |
| CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning | Oct 10, 2019 | DiagnosticObject | CodeCode Available | 1 |
| Lightweight Network Architecture for Real-Time Action Recognition | May 21, 2019 | Action RecognitionCPU | CodeCode Available | 1 |
| Large Scale Holistic Video Understanding | Apr 25, 2019 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| TSM: Temporal Shift Module for Efficient Video Understanding | Nov 20, 2018 | 3D Action RecognitionAction Classification | CodeCode Available | 1 |
| VirtualHome: Simulating Household Activities via Programs | Jun 19, 2018 | Video Understanding | CodeCode Available | 1 |
| AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions | May 23, 2017 | Actin DetectionAction Detection | CodeCode Available | 1 |