| The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA | Jul 2, 2024 | Grounded Video Question AnsweringObject Tracking | —Unverified | 0 | 0 |
| Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration | May 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| TimeLogic: A Temporal Logic Benchmark for Video QA | Jan 13, 2025 | 2kAction Segmentation | —Unverified | 0 | 0 |
| Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training | Jul 5, 2020 | DecoderQuestion Answering | —Unverified | 0 | 0 |
| TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs | Mar 13, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Top-down Activity Representation Learning for Video Question Answering | Sep 12, 2024 | Question AnsweringRepresentation Learning | —Unverified | 0 | 0 |
| Towards Fine-Grained Video Question Answering | Mar 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Towards Understanding Camera Motions in Any Video | Apr 21, 2025 | Question AnsweringText Retrieval | —Unverified | 0 | 0 |
| Traffic-Domain Video Question Answering with Automatic Captioning | Jul 18, 2023 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Transferring Domain-Agnostic Knowledge in Video Question Answering | Oct 26, 2021 | Question AnsweringTransfer Learning | —Unverified | 0 | 0 |
| Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering | May 28, 2019 | Inductive BiasMetric Learning | —Unverified | 0 | 0 |
| Trying Bilinear Pooling in Video-QA | Dec 18, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment | Sep 17, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Uncovering Temporal Context for Video Question and Answering | Nov 15, 2015 | DecoderMultiple-choice | —Unverified | 0 | 0 |
| Understanding Complexity in VideoQA via Visual Program Generation | May 19, 2025 | Code GenerationQuestion Answering | —Unverified | 0 | 0 |
| Understanding Video Scenes through Text: Insights from Text-based Video Question Answering | Sep 4, 2023 | Domain AdaptationQuestion Answering | —Unverified | 0 | 0 |
| Unlocking Video-LLM via Agent-of-Thoughts Distillation | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs | Oct 21, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding | Dec 4, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| ATM: Action Temporality Modeling for Video Question Answering | Sep 5, 2023 | Contrastive LearningOptical Flow Estimation | —Unverified | 0 | 0 |
| VDMA: Video Question Answering with Dynamically Generated Multi-Agents | Jul 4, 2024 | EgoSchemaQuestion Answering | —Unverified | 0 | 0 |
| Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models | Aug 22, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Flexible Frame Selection for Efficient Video Reasoning | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering | Dec 12, 2024 | feature selectionLanguage Modeling | —Unverified | 0 | 0 |
| Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering | Sep 8, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |