| First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge | Sep 20, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Composing Ensembles of Pre-trained Models via Iterative Consensus | Oct 20, 2022 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| Measuring Compositional Consistency for Video Question Answering | Apr 14, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| MarioQA: Answering Questions by Watching Gameplay Videos | Dec 6, 2016 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning | Jan 9, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Admitting Ignorance Helps the Video Question Answering Models to Answer | Jan 15, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Object-Centric Representation Learning for Video Question Answering | Apr 12, 2021 | ObjectQuestion Answering | —Unverified | 0 |
| Multi-Scale Progressive Attention Network for Video Question Answering | Aug 1, 2021 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| CogStream: Context-guided Streaming Video Question Answering | Jun 12, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Multi-object event graph representation learning for Video Question Answering | Sep 12, 2024 | Contrastive LearningGraph Representation Learning | —Unverified | 0 |
| Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering | Jan 3, 2024 | Question AnsweringScheduling | —Unverified | 0 |
| CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding | Jul 21, 2021 | Question AnsweringSentence | —Unverified | 0 |
| Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering | Feb 17, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge | Feb 25, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering | Apr 5, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising | Dec 14, 2021 | Cross-Modal RetrievalDecoder | —Unverified | 0 |
| AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Jan 1, 2025 | GPUQuestion Answering | —Unverified | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 |
| MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling | Mar 10, 2023 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | —Unverified | 0 |
| Locate before Answering: Answer Guided Question Localization for Video Question Answering | Oct 5, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| EVQAScore: Efficient Video Question Answering Data Evaluation | Nov 11, 2024 | Keyword ExtractionQuestion Answering | —Unverified | 0 |
| Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework | Nov 16, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs | Feb 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Aug 15, 2024 | Answer GenerationQuestion-Answer-Generation | —Unverified | 0 |
| Co-attentional Transformers for Story-Based Video Understanding | Oct 27, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |