| HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation | Nov 27, 2024 | Graph GenerationQuestion Answering | —Unverified | 0 |
| HySTER: A Hybrid Spatio-Temporal Event Reasoner | Jan 17, 2021 | Inductive logic programmingQuestion Answering | —Unverified | 0 |
| In-the-Wild Video Question Answering | Oct 1, 2022 | Evidence SelectionQuestion Answering | —Unverified | 0 |
| Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Jul 3, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 |
| iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Nov 16, 2020 | Common Sense ReasoningDense Video Captioning | —Unverified | 0 |
| IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs | Dec 13, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability | Jun 25, 2021 | Bias DetectionQuestion Answering | —Unverified | 0 |
| Is a Video worth n n Images? A Highly Efficient Approach to Transformer-based Video Question Answering | May 16, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Zero-Shot Video Question Answering with Procedural Programs | Dec 1, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| KeyVideoLLM: Towards Large-scale Video Keyframe Selection | Jul 3, 2024 | Data CompressionManagement | —Unverified | 0 |
| Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering | Jul 25, 2023 | graph constructionQuestion Answering | —Unverified | 0 |
| KnowIT VQA: Answering Knowledge-Based Questions about Videos | Oct 23, 2019 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Video Instruction Tuning With Synthetic Data | Oct 3, 2024 | 3D Question Answering (3D-QA) | —Unverified | 0 |
| Knowledge-Based Visual Question Answering in Videos | Apr 17, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Knowledge Proxy Intervention for Deconfounded Video Question Answering | Jan 1, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Koala: Key frame-conditioned long video-LLM | Apr 5, 2024 | Action RecognitionQuestion Answering | —Unverified | 0 |
| Language-aware Visual Semantic Distillation for Video Question Answering | Jan 1, 2024 | Answer GenerationQuestion Answering | —Unverified | 0 |
| Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering | Apr 7, 2023 | Question AnsweringQuestion Generation | —Unverified | 0 |
| (2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering | Feb 18, 2022 | Question AnsweringSpatio-temporal Scene Graphs | —Unverified | 0 |
| Video Language Co-Attention with Multimodal Fast-Learning Feature Fusion for VideoQA | May 1, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning | Mar 30, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering | Dec 1, 2021 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Learning Question-Guided Video Representation for Multi-Turn Video Question Answering | Jul 31, 2019 | NavigateQuestion Answering | —Unverified | 0 |
| Adversarial Multimodal Network for Movie Question Answering | Jun 24, 2019 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Apr 6, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 |