| Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering | Mar 27, 2025 | Emotion RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Leveraging Video Descriptions to Learn Video Question Answering | Nov 12, 2016 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| VideoLLM-online: Online Video Large Language Model for Streaming Video | Jun 17, 2024 | GPULanguage Modeling | —Unverified | 0 | 0 |
| EVQAScore: Efficient Video Question Answering Data Evaluation | Nov 11, 2024 | Keyword ExtractionQuestion Answering | —Unverified | 0 | 0 |
| E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer | Nov 28, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment | Mar 12, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling | Oct 21, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 | 0 |
| LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering | Nov 29, 2021 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments | Jan 1, 2021 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| ENTER: Event Based Interpretable Reasoning for VideoQA | Jan 24, 2025 | Code GenerationEgoSchema | —Unverified | 0 | 0 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Aug 15, 2024 | Answer GenerationQuestion-Answer-Generation | —Unverified | 0 | 0 |
| LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs | Feb 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Locate before Answering: Answer Guided Question Localization for Video Question Answering | Oct 5, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Admitting Ignorance Helps the Video Question Answering Models to Answer | Jan 15, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 | 0 |
| Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Oct 9, 2024 | Audio captioningLarge Language Model | —Unverified | 0 | 0 |
| End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling | Jul 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Efficient Motion-Aware Video MLLM | Jan 1, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| VUDG: A Dataset for Video Understanding Domain Generalization | May 30, 2025 | Domain GeneralizationMultiple-choice | —Unverified | 0 | 0 |
| MarioQA: Answering Questions by Watching Gameplay Videos | Dec 6, 2016 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Measuring Compositional Consistency for Video Question Answering | Apr 14, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation | May 4, 2023 | DecoderQuestion Answering | —Unverified | 0 | 0 |
| VideoOrion: Tokenizing Object Dynamics in Videos | Nov 25, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering | Jul 1, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities | Nov 9, 2023 | Action ClassificationAudio Classification | —Unverified | 0 | 0 |
| AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Jan 1, 2025 | GPUQuestion Answering | —Unverified | 0 | 0 |
| M-LLM Based Video Frame Selection for Efficient Video Understanding | Feb 27, 2025 | EgoSchemaLanguage Modeling | —Unverified | 0 | 0 |
| MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering | Oct 6, 2023 | counterfactualQuestion Answering | —Unverified | 0 | 0 |
| Modality Alignment between Deep Representations for Effective Video-and-Language Learning | Jun 1, 2022 | Question AnsweringVideo Captioning | —Unverified | 0 | 0 |
| Modality Shifting Attention Network for Multi-modal Video Question Answering | Jul 4, 2020 | Question AnsweringTemporal Localization | —Unverified | 0 | 0 |
| Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering | May 13, 2022 | Question AnsweringSemantic Composition | —Unverified | 0 | 0 |
| Modular Blended Attention Network for Video Question Answering | Nov 2, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| MoReVQA: Exploring Modular Reasoning Models for Video Question Answering | Apr 9, 2024 | EgoSchemaMultiple-choice | —Unverified | 0 | 0 |
| Motion-Appearance Co-Memory Networks for Video Question Answering | Mar 29, 2018 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering | Aug 11, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Diversifying Joint Vision-Language Tokenization Learning | Jun 6, 2023 | Question AnsweringRepresentation Learning | —Unverified | 0 | 0 |
| Distraction-free Embeddings for Robust VQA | Aug 31, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents | Apr 25, 2018 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding | Dec 8, 2023 | FormQuestion Answering | —Unverified | 0 | 0 |
| Discovering the Real Association: Multimodal Causal Reasoning in Video Question Answering | Jan 1, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Dense but Efficient VideoQA for Intricate Compositional Reasoning | Oct 19, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling | Mar 10, 2023 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | —Unverified | 0 | 0 |
| VideoPrism: A Foundational Visual Encoder for Video Understanding | Feb 20, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering | Feb 17, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge | Feb 25, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Multi-object event graph representation learning for Video Question Answering | Sep 12, 2024 | Contrastive LearningGraph Representation Learning | —Unverified | 0 | 0 |
| Multi-Scale Progressive Attention Network for Video Question Answering | Aug 1, 2021 | Question AnsweringRelational Reasoning | —Unverified | 0 | 0 |
| Data augmentation techniques for the Video Question Answering task | Aug 22, 2020 | Data AugmentationQuestion Answering | —Unverified | 0 | 0 |