| Learning Question-Guided Video Representation for Multi-Turn Video Question Answering | Jul 31, 2019 | NavigateQuestion Answering | —Unverified | 0 | 0 |
| Adversarial Multimodal Network for Movie Question Answering | Jun 24, 2019 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Apr 6, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Neural Reasoning, Fast and Slow, for Video Question Answering | Jul 10, 2019 | Natural QuestionsQuestion Answering | —Unverified | 0 | 0 |
| Learning to Rehearse in Long Sequence Memorization | Jun 2, 2021 | MemorizationQuestion Answering | —Unverified | 0 | 0 |
| Learning Trajectory-Word Alignments for Video-Language Tasks | Jan 5, 2023 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering | Apr 3, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering | Mar 27, 2025 | Emotion RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Leveraging Video Descriptions to Learn Video Question Answering | Nov 12, 2016 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| VideoLLM-online: Online Video Large Language Model for Streaming Video | Jun 17, 2024 | GPULanguage Modeling | —Unverified | 0 | 0 |
| EVQAScore: Efficient Video Question Answering Data Evaluation | Nov 11, 2024 | Keyword ExtractionQuestion Answering | —Unverified | 0 | 0 |
| E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer | Nov 28, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment | Mar 12, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling | Oct 21, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 | 0 |
| LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering | Nov 29, 2021 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments | Jan 1, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| ENTER: Event Based Interpretable Reasoning for VideoQA | Jan 24, 2025 | Code GenerationEgoSchema | —Unverified | 0 | 0 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Aug 15, 2024 | Answer GenerationQuestion-Answer-Generation | —Unverified | 0 | 0 |
| LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs | Feb 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Locate before Answering: Answer Guided Question Localization for Video Question Answering | Oct 5, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Admitting Ignorance Helps the Video Question Answering Models to Answer | Jan 15, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 | 0 |
| Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Oct 9, 2024 | Audio captioningLarge Language Model | —Unverified | 0 | 0 |