| Is Space-Time Attention All You Need for Video Understanding? | Feb 9, 2021 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| HySTER: A Hybrid Spatio-Temporal Event Reasoner | Jan 17, 2021 | Inductive logic programmingQuestion Answering | —Unverified | 0 |
| Recent Advances in Video Question Answering: A Review of Datasets and Methods | Jan 15, 2021 | Information RetrievalMachine Translation | —Unverified | 0 |
| End-to-End Video Question-Answer Generation with Generator-Pretester Network | Jan 5, 2021 | Answer GenerationQuestion-Answer-Generation | CodeCode Available | 0 |
| Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments | Jan 1, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering | Jan 1, 2021 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature | Jan 1, 2021 | Question AnsweringVideo Compression | —Unverified | 0 |
| Trying Bilinear Pooling in Video-QA | Dec 18, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| On Modality Bias in the TVQA Dataset | Dec 18, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions | Dec 8, 2020 | counterfactualDescriptive | CodeCode Available | 1 |
| Open-Ended Multi-Modal Relational Reasoning for Video Question Answering | Dec 1, 2020 | Question AnsweringRelational Reasoning | CodeCode Available | 0 |
| Just Ask: Learning to Answer Questions from Millions of Narrated Videos | Dec 1, 2020 | Question AnsweringQuestion Generation | CodeCode Available | 1 |
| iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Nov 16, 2020 | Common Sense ReasoningDense Video Captioning | —Unverified | 0 |
| ActBERT: Learning Global-Local Video-Text Representations | Nov 14, 2020 | Action SegmentationQuestion Answering | CodeCode Available | 0 |
| Co-attentional Transformers for Story-Based Video Understanding | Oct 27, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Hierarchical Conditional Relation Networks for Multimodal Video Question Answering | Oct 18, 2020 | Question AnsweringRelation | —Unverified | 0 |
| Self-supervised pre-training and contrastive representation learning for multiple-choice video QA | Sep 17, 2020 | Auxiliary LearningContrastive Learning | —Unverified | 0 |
| Data augmentation techniques for the Video Question Answering task | Aug 22, 2020 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Location-aware Graph Convolutional Networks for Video Question Answering | Aug 7, 2020 | Action Recognitiongraph construction | CodeCode Available | 1 |
| Video Question Answering on Screencast Tutorials | Aug 2, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Visual Relation Grounding in Videos | Jul 17, 2020 | Question AnsweringRelation | CodeCode Available | 1 |
| Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions | Jul 17, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets | Jul 7, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training | Jul 5, 2020 | DecoderQuestion Answering | —Unverified | 0 |
| Modality Shifting Attention Network for Multi-modal Video Question Answering | Jul 4, 2020 | Question AnsweringTemporal Localization | —Unverified | 0 |
| Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA | May 13, 2020 | Image CaptioningMulti-Label Classification | CodeCode Available | 1 |
| DramaQA: Character-Centered Video Story Understanding with Hierarchical QA | May 7, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| LifeQA: A Real-life Dataset for Video Question Answering | May 1, 2020 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | May 1, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Knowledge-Based Visual Question Answering in Videos | Apr 17, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning | Mar 6, 2020 | Density EstimationNoise Estimation | CodeCode Available | 0 |
| Hierarchical Conditional Relation Networks for Video Question Answering | Feb 25, 2020 | Audio-Visual Question Answering (AVQA)Question Answering | CodeCode Available | 1 |
| Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge | Feb 25, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| TutorialVQA: Question Answering Dataset for Tutorial Videos | Dec 2, 2019 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| Video Dialog via Progressive Inference and Cross-Transformer | Nov 1, 2019 | Answer GenerationQuestion Answering | —Unverified | 0 |
| KnowIT VQA: Answering Knowledge-Based Questions about Videos | Oct 23, 2019 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| A Better Way to Attend: Attention with Trees for Video Question Answering | Sep 5, 2019 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| Learning Question-Guided Video Representation for Multi-Turn Video Question Answering | Jul 31, 2019 | NavigateQuestion Answering | —Unverified | 0 |
| OmniNet: A unified architecture for multi-modal multi-task learning | Jul 17, 2019 | Image CaptioningMulti-Task Learning | CodeCode Available | 0 |
| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 |
| Video Question Generation via Cross-Modal Self-Attention Networks Learning | Jul 5, 2019 | DiversityQuestion Answering | —Unverified | 0 |
| Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks | Jun 28, 2019 | Answer GenerationDecoder | —Unverified | 0 |
| Adversarial Multimodal Network for Movie Question Answering | Jun 24, 2019 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering | Jun 6, 2019 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering | May 28, 2019 | Inductive BiasMetric Learning | —Unverified | 0 |
| TVQA+: Spatio-Temporal Grounding for Video Question Answering | Apr 25, 2019 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering | Apr 8, 2019 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| Holistic Multi-modal Memory Network for Movie Question Answering | Nov 12, 2018 | Question AnsweringRetrieval | —Unverified | 0 |
| TVQA: Localized, Compositional Video Question Answering | Sep 5, 2018 | Video Question Answering | CodeCode Available | 0 |
| A Joint Sequence Fusion Model for Video Question Answering and Retrieval | Aug 7, 2018 | DecoderMultiple-choice | CodeCode Available | 0 |