| HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do | May 1, 2020 | Video Understanding | —Unverified | 0 |
| Towards Visually Explaining Video Understanding Networks with Perturbation | May 1, 2020 | Video Understanding | CodeCode Available | 1 |
| Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube | Apr 29, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture | Apr 18, 2020 | Anomaly DetectionClassification | CodeCode Available | 0 |
| Knowledge-Based Visual Question Answering in Videos | Apr 17, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Real-Time Segmentation Networks should be Latency Aware | Apr 6, 2020 | Autonomous VehiclesScene Segmentation | —Unverified | 0 |
| Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries | Apr 3, 2020 | Referring Expression SegmentationVideo Segmentation | —Unverified | 0 |
| Fully Automated Hand Hygiene Monitoring\ Operating Room using 3D Convolutional Neural Network | Mar 20, 2020 | Optical Flow EstimationTransfer Learning | —Unverified | 0 |
| Beyond the Camera: Neural Networks in World Coordinates | Mar 12, 2020 | Action RecognitionVideo Stabilization | —Unverified | 0 |
| Top-1 Solution of Multi-Moments in Time Challenge 2019 | Mar 12, 2020 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning | Mar 11, 2020 | Question AnsweringVideo Captioning | CodeCode Available | 1 |
| CTM: Collaborative Temporal Modeling for Action Recognition | Feb 8, 2020 | Action RecognitionVideo Understanding | —Unverified | 0 |
| Weakly Supervised Temporal Action Localization Using Deep Metric Learning | Jan 21, 2020 | Action LocalizationMetric Learning | CodeCode Available | 1 |
| Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video | Jan 18, 2020 | Decision Makingreinforcement-learning | CodeCode Available | 1 |
| Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data | Jan 17, 2020 | Graph LearningVideo Understanding | —Unverified | 0 |
| Temporal Interlacing Network | Jan 17, 2020 | Optical Flow EstimationVideo Understanding | CodeCode Available | 1 |
| EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video | Jan 15, 2020 | DiversityRecommendation Systems | CodeCode Available | 1 |
| SoccerDB: A Large-Scale Database for Comprehensive Video Understanding | Dec 10, 2019 | Action ClassificationAction Detection | CodeCode Available | 0 |
| Video action detection by learning graph-based spatio-temporal interactions | Dec 9, 2019 | Action DetectionAction Localization | CodeCode Available | 0 |
| VideoDG: Generalizing Temporal Relations in Videos to Novel Domains | Dec 8, 2019 | Action RecognitionData Augmentation | CodeCode Available | 0 |
| Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection | Dec 7, 2019 | object-detectionObject Detection | CodeCode Available | 0 |
| A Context-Aware Loss Function for Action Spotting in Soccer Videos | Dec 3, 2019 | Action SpottingVideo Understanding | CodeCode Available | 0 |
| BERT for Large-scale Video Segment Classification with Test-time Augmentation | Dec 2, 2019 | General ClassificationVideo Understanding | —Unverified | 0 |
| A Multigrid Method for Efficiently Training Video Models | Dec 2, 2019 | Action DetectionAction Recognition | CodeCode Available | 1 |
| AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization | Nov 27, 2019 | Action ClassificationAction Recognition | —Unverified | 0 |