| Modality Alignment between Deep Representations for Effective Video-and-Language Learning | Jun 1, 2022 | Question AnsweringVideo Captioning | —Unverified | 0 |
| Modality Shifting Attention Network for Multi-modal Video Question Answering | Jul 4, 2020 | Question AnsweringTemporal Localization | —Unverified | 0 |
| Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering | May 13, 2022 | Question AnsweringSemantic Composition | —Unverified | 0 |
| Modular Blended Attention Network for Video Question Answering | Nov 2, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| MoReVQA: Exploring Modular Reasoning Models for Video Question Answering | Apr 9, 2024 | EgoSchemaMultiple-choice | —Unverified | 0 |
| Motion-Appearance Co-Memory Networks for Video Question Answering | Mar 29, 2018 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering | Aug 11, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Diversifying Joint Vision-Language Tokenization Learning | Jun 6, 2023 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Distraction-free Embeddings for Robust VQA | Aug 31, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents | Apr 25, 2018 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding | Dec 8, 2023 | FormQuestion Answering | —Unverified | 0 |
| Discovering the Real Association: Multimodal Causal Reasoning in Video Question Answering | Jan 1, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Dense but Efficient VideoQA for Intricate Compositional Reasoning | Oct 19, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling | Mar 10, 2023 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | —Unverified | 0 |
| VideoPrism: A Foundational Visual Encoder for Video Understanding | Feb 20, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering | Feb 17, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge | Feb 25, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Multi-object event graph representation learning for Video Question Answering | Sep 12, 2024 | Contrastive LearningGraph Representation Learning | —Unverified | 0 |
| Multi-Scale Progressive Attention Network for Video Question Answering | Aug 1, 2021 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| Data augmentation techniques for the Video Question Answering task | Aug 22, 2020 | Data AugmentationQuestion Answering | —Unverified | 0 |
| Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering | Apr 5, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| NEWSKVQA: Knowledge-Aware News Video Question Answering | Feb 8, 2022 | Common Sense ReasoningManagement | —Unverified | 0 |
| VideoQA-SC: Adaptive Semantic Communication for Video Question Answering | May 17, 2024 | Question AnsweringSemantic Communication | —Unverified | 0 |
| CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning | Apr 1, 2021 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Object-Centric Representation Learning for Video Question Answering | Apr 12, 2021 | ObjectQuestion Answering | —Unverified | 0 |