| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 |
| Learning to Rehearse in Long Sequence Memorization | Jun 2, 2021 | MemorizationQuestion Answering | —Unverified | 0 |
| Learning Trajectory-Word Alignments for Video-Language Tasks | Jan 5, 2023 | Question AnsweringRetrieval | —Unverified | 0 |
| Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering | Mar 27, 2025 | Emotion RecognitionQuestion Answering | —Unverified | 0 |
| EVQAScore: Efficient Video Question Answering Data Evaluation | Nov 11, 2024 | Keyword ExtractionQuestion Answering | —Unverified | 0 |
| E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer | Nov 28, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment | Mar 12, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling | Oct 21, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering | Nov 29, 2021 | DiversityQuestion Answering | —Unverified | 0 |
| Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments | Jan 1, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| ENTER: Event Based Interpretable Reasoning for VideoQA | Jan 24, 2025 | Code GenerationEgoSchema | —Unverified | 0 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Aug 15, 2024 | Answer GenerationQuestion-Answer-Generation | —Unverified | 0 |
| LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs | Feb 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Locate before Answering: Answer Guided Question Localization for Video Question Answering | Oct 5, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Admitting Ignorance Helps the Video Question Answering Models to Answer | Jan 15, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 |
| Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Oct 9, 2024 | Audio captioningLarge Language Model | —Unverified | 0 |
| End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling | Jul 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Efficient Motion-Aware Video MLLM | Jan 1, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| VUDG: A Dataset for Video Understanding Domain Generalization | May 30, 2025 | Domain GeneralizationMultiple-choice | —Unverified | 0 |
| MarioQA: Answering Questions by Watching Gameplay Videos | Dec 6, 2016 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Measuring Compositional Consistency for Video Question Answering | Apr 14, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation | May 4, 2023 | DecoderQuestion Answering | —Unverified | 0 |