| Representation Learning on Visual-Symbolic Graphs for Video Understanding | May 17, 2019 | Action ClassificationAction Detection | —Unverified | 0 |
| Video Instance Segmentation | May 12, 2019 | Instance SegmentationSegmentation | CodeCode Available | 2 |
| Large Scale Holistic Video Understanding | Apr 25, 2019 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Recurrent Space-time Graph Neural Networks | Apr 11, 2019 | Action RecognitionHuman-Object Interaction Detection | CodeCode Available | 0 |
| Constructing Hierarchical Q&A Datasets for Video Story Understanding | Apr 1, 2019 | Video Understanding | —Unverified | 0 |
| Wasserstein Dependency Measure for Representation Learning | Mar 28, 2019 | Object Recognitionreinforcement-learning | —Unverified | 0 |
| 4D Generic Video Object Proposals | Jan 26, 2019 | Instance SegmentationObject | CodeCode Available | 0 |
| DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition | Jan 11, 2019 | Action ClassificationAction Recognition | —Unverified | 0 |
| Future semantic segmentation of time-lapsed videos with large temporal displacement | Dec 27, 2018 | SegmentationSemantic Segmentation | —Unverified | 0 |
| Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition | Dec 13, 2018 | 3D Action RecognitionAction Recognition | —Unverified | 0 |
| Long-Term Feature Banks for Detailed Video Understanding | Dec 12, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| A Structured Model For Action Detection | Dec 9, 2018 | Action Detectionmodel | —Unverified | 0 |
| An Attempt towards Interpretable Audio-Visual Video Captioning | Dec 7, 2018 | Audio captioningAudio-Visual Video Captioning | —Unverified | 0 |
| The Visual Centrifuge: Model-Free Layered Video Representations | Dec 4, 2018 | Color Constancymodel | CodeCode Available | 0 |
| How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos | Dec 2, 2018 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction | Nov 28, 2018 | Action RecognitionPrediction | —Unverified | 0 |
| Integrated Object Detection and Tracking with Tracklet-Conditioned Detection | Nov 27, 2018 | Objectobject-detection | —Unverified | 0 |
| Efficient Video Understanding via Layered Multi Frame-Rate Analysis | Nov 24, 2018 | Autonomous DrivingVideo Understanding | —Unverified | 0 |
| TSM: Temporal Shift Module for Efficient Video Understanding | Nov 20, 2018 | 3D Action RecognitionAction Classification | CodeCode Available | 1 |
| NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification | Nov 12, 2018 | Efficient Neural NetworkGeneral Classification | CodeCode Available | 0 |
| Random Temporal Skipping for Multirate Video Analysis | Oct 30, 2018 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Morph: Flexible Acceleration for 3D CNN-based Video Understanding | Oct 16, 2018 | MORPHVideo Recognition | —Unverified | 0 |
| Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images | Oct 4, 2018 | Domain AdaptationImage-to-Image Translation | CodeCode Available | 0 |
| Representation Flow for Action Recognition | Oct 2, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Learnable Pooling Methods for Video Classification | Oct 1, 2018 | ClassificationGeneral Classification | CodeCode Available | 0 |
| Non-local NetVLAD Encoding for Video Classification | Sep 29, 2018 | ClassificationGeneral Classification | —Unverified | 0 |
| Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling | Sep 21, 2018 | General ClassificationVideo Classification | —Unverified | 0 |
| Label Denoising with Large Ensembles of Heterogeneous Neural Networks | Sep 12, 2018 | Data AugmentationDenoising | —Unverified | 0 |
| Localizing Moments in Video with Temporal Language | Sep 5, 2018 | Natural Language QueriesRetrieval | CodeCode Available | 0 |
| End-to-End Joint Semantic Segmentation of Actors and Actions in Video | Sep 1, 2018 | Action RecognitionSegmentation | —Unverified | 0 |
| Teaching Machines to Understand Baseball Games: Large-Scale Baseball Video Database for Multiple Video Understanding Tasks | Sep 1, 2018 | Video AlignmentVideo Recognition | —Unverified | 0 |
| Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge | Aug 21, 2018 | Video Understanding | CodeCode Available | 0 |
| Diagnosing Error in Temporal Action Detectors | Jul 27, 2018 | Action LocalizationDiagnostic | CodeCode Available | 0 |
| Video Time: Properties, Encoders and Evaluation | Jul 18, 2018 | Video Understanding | —Unverified | 0 |
| Query-Conditioned Three-Player Adversarial Network for Video Summarization | Jul 17, 2018 | Generative Adversarial NetworkVideo Summarization | —Unverified | 0 |
| When Work Matters: Transforming Classical Network Structures to Graph CNN | Jul 7, 2018 | Graph ClassificationVideo Understanding | —Unverified | 0 |
| Deep Spatio-Temporal Random Fields for Efficient Video Segmentation | Jul 3, 2018 | Instance SegmentationSemantic Segmentation | —Unverified | 0 |
| Long Activity Video Understanding using Functional Object-Oriented Network | Jul 3, 2018 | ObjectVideo Understanding | —Unverified | 0 |
| Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition | Jun 27, 2018 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| VirtualHome: Simulating Household Activities via Programs | Jun 19, 2018 | Video Understanding | CodeCode Available | 1 |
| Massively Parallel Video Networks | Jun 11, 2018 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning | Jun 1, 2018 | Action RecognitionRepresentation Learning | —Unverified | 0 |
| What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets | Jun 1, 2018 | Video Understanding | —Unverified | 0 |
| DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding | May 19, 2018 | Action Recognition In VideosGesture Recognition | —Unverified | 0 |
| Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning | May 16, 2018 | Action RecognitionAtari Games | —Unverified | 0 |
| Dilated Temporal Relational Adversarial Network for Generic Video Summarization | Apr 30, 2018 | Generative Adversarial NetworkVideo Summarization | —Unverified | 0 |
| Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos | Apr 25, 2018 | General ClassificationVideo Classification | —Unverified | 0 |
| ECO: Efficient Convolutional Network for Online Video Understanding | Apr 24, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning | Apr 15, 2018 | Video CaptioningVideo Understanding | CodeCode Available | 0 |
| End-to-End Learning of Motion Representation for Video Understanding | Apr 2, 2018 | Action RecognitionOptical Flow Estimation | CodeCode Available | 0 |