| HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do | May 1, 2020 | Video Understanding | —Unverified | 0 |
| Towards Visually Explaining Video Understanding Networks with Perturbation | May 1, 2020 | Video Understanding | CodeCode Available | 1 |
| Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube | Apr 29, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture | Apr 18, 2020 | Anomaly DetectionClassification | CodeCode Available | 0 |
| Knowledge-Based Visual Question Answering in Videos | Apr 17, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Real-Time Segmentation Networks should be Latency Aware | Apr 6, 2020 | Autonomous VehiclesScene Segmentation | —Unverified | 0 |
| Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries | Apr 3, 2020 | Referring Expression SegmentationVideo Segmentation | —Unverified | 0 |
| Fully Automated Hand Hygiene Monitoring\ Operating Room using 3D Convolutional Neural Network | Mar 20, 2020 | Optical Flow EstimationTransfer Learning | —Unverified | 0 |
| Beyond the Camera: Neural Networks in World Coordinates | Mar 12, 2020 | Action RecognitionVideo Stabilization | —Unverified | 0 |
| Top-1 Solution of Multi-Moments in Time Challenge 2019 | Mar 12, 2020 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning | Mar 11, 2020 | Question AnsweringVideo Captioning | CodeCode Available | 1 |
| CTM: Collaborative Temporal Modeling for Action Recognition | Feb 8, 2020 | Action RecognitionVideo Understanding | —Unverified | 0 |
| Weakly Supervised Temporal Action Localization Using Deep Metric Learning | Jan 21, 2020 | Action LocalizationMetric Learning | CodeCode Available | 1 |
| Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video | Jan 18, 2020 | Decision Makingreinforcement-learning | CodeCode Available | 1 |
| Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data | Jan 17, 2020 | Graph LearningVideo Understanding | —Unverified | 0 |
| Temporal Interlacing Network | Jan 17, 2020 | Optical Flow EstimationVideo Understanding | CodeCode Available | 1 |
| EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video | Jan 15, 2020 | DiversityRecommendation Systems | CodeCode Available | 1 |
| SoccerDB: A Large-Scale Database for Comprehensive Video Understanding | Dec 10, 2019 | Action ClassificationAction Detection | CodeCode Available | 0 |
| Video action detection by learning graph-based spatio-temporal interactions | Dec 9, 2019 | Action DetectionAction Localization | CodeCode Available | 0 |
| VideoDG: Generalizing Temporal Relations in Videos to Novel Domains | Dec 8, 2019 | Action RecognitionData Augmentation | CodeCode Available | 0 |
| Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection | Dec 7, 2019 | object-detectionObject Detection | CodeCode Available | 0 |
| A Context-Aware Loss Function for Action Spotting in Soccer Videos | Dec 3, 2019 | Action SpottingVideo Understanding | CodeCode Available | 0 |
| BERT for Large-scale Video Segment Classification with Test-time Augmentation | Dec 2, 2019 | General ClassificationVideo Understanding | —Unverified | 0 |
| A Multigrid Method for Efficiently Training Video Models | Dec 2, 2019 | Action DetectionAction Recognition | CodeCode Available | 1 |
| AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization | Nov 27, 2019 | Action ClassificationAction Recognition | —Unverified | 0 |
| Mimic The Raw Domain: Accelerating Action Recognition in the Compressed Domain | Nov 19, 2019 | Action RecognitionVideo Recognition | —Unverified | 0 |
| Cross-Class Relevance Learning for Temporal Concept Localization | Nov 19, 2019 | Feature EngineeringVideo Understanding | —Unverified | 0 |
| Multi-attention Networks for Temporal Localization of Video-level Labels | Nov 15, 2019 | Action RecognitionTemporal Action Localization | CodeCode Available | 0 |
| Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding | Nov 1, 2019 | Action DetectionAction Recognition | CodeCode Available | 0 |
| Comprehensive Video Understanding: Video summarization with content-based video recommender design | Oct 30, 2019 | Action RecognitionData Augmentation | —Unverified | 0 |
| MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization | Oct 27, 2019 | Knowledge DistillationVideo Understanding | CodeCode Available | 0 |
| KnowIT VQA: Answering Knowledge-Based Questions about Videos | Oct 23, 2019 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection | Oct 18, 2019 | Action Detectionobject-detection | —Unverified | 0 |
| Tiny Video Networks | Oct 15, 2019 | CPUGPU | CodeCode Available | 0 |
| OmniTrack: Real-time detection and tracking of objects, text and logos in video | Oct 14, 2019 | GPUobject-detection | —Unverified | 0 |
| CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning | Oct 10, 2019 | DiagnosticObject | CodeCode Available | 1 |
| ViP: Video Platform for PyTorch | Oct 7, 2019 | BenchmarkingVideo Understanding | CodeCode Available | 0 |
| A SPIKING SEQUENTIAL MODEL: RECURRENT LEAKY INTEGRATE-AND-FIRE | Sep 25, 2019 | Text SummarizationVideo Understanding | —Unverified | 0 |
| Question Answering is a Format; When is it Useful? | Sep 25, 2019 | Machine TranslationQuestion Answering | —Unverified | 0 |
| Zero-Shot Action Recognition in Videos: A Survey | Sep 13, 2019 | Action RecognitionAction Recognition In Still Images | —Unverified | 0 |
| Gaussian Temporal Awareness Networks for Action Localization | Sep 9, 2019 | Action Localizationobject-detection | CodeCode Available | 0 |
| Only Time Can Tell: Discovering Temporal Data for Temporal Modeling | Jul 19, 2019 | BenchmarkingMotion Estimation | —Unverified | 0 |
| Localizing Unseen Activities in Video via Image Query | Jun 28, 2019 | Action LocalizationVideo Understanding | —Unverified | 0 |
| UniDual: A Unified Model for Image and Video Understanding | Jun 10, 2019 | Multi-Task LearningVideo Understanding | —Unverified | 0 |
| Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network | Jun 2, 2019 | General ClassificationGraph Neural Network | —Unverified | 0 |
| Creative Flow+ Dataset | Jun 1, 2019 | 3D Character Animation From A Single PhotoDepth Estimation | CodeCode Available | 0 |
| Audio Caption in a Car Setting with a Sentence-Level Loss | May 31, 2019 | Audio captioningDecoder | CodeCode Available | 0 |
| AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures | May 30, 2019 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Exploring Temporal Information for Improved Video Understanding | May 25, 2019 | Action RecognitionOptical Flow Estimation | CodeCode Available | 0 |
| Lightweight Network Architecture for Real-Time Action Recognition | May 21, 2019 | Action RecognitionCPU | CodeCode Available | 1 |