| Is Space-Time Attention All You Need for Video Understanding? | Feb 9, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| HySTER: A Hybrid Spatio-Temporal Event Reasoner | Jan 17, 2021 | Inductive logic programmingQuestion Answering | —Unverified | 0 |
| Recent Advances in Video Question Answering: A Review of Datasets and Methods | Jan 15, 2021 | Information RetrievalMachine Translation | —Unverified | 0 |
| End-to-End Video Question-Answer Generation with Generator-Pretester Network | Jan 5, 2021 | Answer GenerationQuestion-Answer-Generation | CodeCode Available | 0 |
| Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments | Jan 1, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering | Jan 1, 2021 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature | Jan 1, 2021 | Question AnsweringVideo Compression | —Unverified | 0 |
| Trying Bilinear Pooling in Video-QA | Dec 18, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| On Modality Bias in the TVQA Dataset | Dec 18, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions | Dec 8, 2020 | counterfactualDescriptive | CodeCode Available | 1 |
| Open-Ended Multi-Modal Relational Reasoning for Video Question Answering | Dec 1, 2020 | Question AnsweringRelational Reasoning | CodeCode Available | 0 |
| Just Ask: Learning to Answer Questions from Millions of Narrated Videos | Dec 1, 2020 | Question AnsweringQuestion Generation | CodeCode Available | 1 |
| iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Nov 16, 2020 | Common Sense ReasoningDense Video Captioning | —Unverified | 0 |
| ActBERT: Learning Global-Local Video-Text Representations | Nov 14, 2020 | Action SegmentationQuestion Answering | CodeCode Available | 0 |
| Co-attentional Transformers for Story-Based Video Understanding | Oct 27, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Hierarchical Conditional Relation Networks for Multimodal Video Question Answering | Oct 18, 2020 | Question AnsweringRelation | —Unverified | 0 |
| Self-supervised pre-training and contrastive representation learning for multiple-choice video QA | Sep 17, 2020 | Auxiliary LearningContrastive Learning | —Unverified | 0 |
| Data augmentation techniques for the Video Question Answering task | Aug 22, 2020 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Location-aware Graph Convolutional Networks for Video Question Answering | Aug 7, 2020 | Action Recognitiongraph construction | CodeCode Available | 1 |
| Video Question Answering on Screencast Tutorials | Aug 2, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Visual Relation Grounding in Videos | Jul 17, 2020 | Question AnsweringRelation | CodeCode Available | 1 |
| Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions | Jul 17, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets | Jul 7, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training | Jul 5, 2020 | DecoderQuestion Answering | —Unverified | 0 |
| Modality Shifting Attention Network for Multi-modal Video Question Answering | Jul 4, 2020 | Question AnsweringTemporal Localization | —Unverified | 0 |