| Co-attentional Transformers for Story-Based Video Understanding | Oct 27, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Improved Actor Relation Graph based Group Activity Recognition | Oct 24, 2020 | Activity RecognitionGroup Activity Recognition | CodeCode Available | 1 |
| Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset | Oct 15, 2020 | Activity RecognitionEgocentric Activity Recognition | —Unverified | 0 |
| Video Action Understanding | Oct 13, 2020 | Action UnderstandingDeep Learning | CodeCode Available | 0 |
| Global Self-Attention Networks for Image Recognition | Oct 6, 2020 | Video Understanding | —Unverified | 0 |
| Features Understanding in 3D CNNs for Actions Recognition in Video | Oct 1, 2020 | Action RecognitionDecision Making | CodeCode Available | 0 |
| PAN: Towards Fast Action Recognition via Learning Persistence of Appearance | Aug 8, 2020 | Action RecognitionOptical Flow Estimation | CodeCode Available | 1 |
| Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition | Aug 3, 2020 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020) | Aug 3, 2020 | Natural Language QueriesRetrieval | CodeCode Available | 1 |
| Self-supervised Motion Representation via Scattering Local Motion Cues | Aug 1, 2020 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning | Aug 1, 2020 | Accident AnticipationActivity Prediction | CodeCode Available | 1 |
| Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection | Jul 29, 2020 | object-detectionObject Detection | —Unverified | 0 |
| Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos | Jul 22, 2020 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| MovieNet: A Holistic Dataset for Movie Understanding | Jul 21, 2020 | Video Understanding | —Unverified | 0 |
| MotionSqueeze: Neural Motion Feature Learning for Video Understanding | Jul 20, 2020 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training | Jul 5, 2020 | DecoderQuestion Answering | —Unverified | 0 |
| Video Moment Localization using Object Evidence and Reverse Captioning | Jun 18, 2020 | Language-Based Temporal LocalizationLanguage Modelling | CodeCode Available | 1 |
| Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization | Jun 14, 2020 | Action DetectionAction Localization | CodeCode Available | 1 |
| Video Understanding as Machine Translation | Jun 12, 2020 | Machine TranslationMetric Learning | —Unverified | 0 |
| Large Scale Video Representation Learning via Relational Graph Clustering | Jun 1, 2020 | ClusteringGraph Clustering | —Unverified | 0 |
| Screencast Tutorial Video Understanding | Jun 1, 2020 | object-detectionObject Detection | CodeCode Available | 0 |
| Temporal Aggregate Representations for Long-Range Video Understanding | Jun 1, 2020 | Action AnticipationAction Recognition | CodeCode Available | 1 |
| CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction | May 26, 2020 | Autonomous VehiclesPrediction | CodeCode Available | 0 |
| DramaQA: Character-Centered Video Story Understanding with Hierarchical QA | May 7, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning | May 1, 2020 | DiagnosticObject | —Unverified | 0 |