| VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation | Jun 8, 2021 | Multi-Task LearningQuestion Answering | CodeCode Available | 1 |
| DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization | Jun 1, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions | May 18, 2021 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events | Mar 29, 2021 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| A Comprehensive Review of the Video-to-Text Problem | Mar 27, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| On the hidden treasure of dialog in video question answering | Mar 26, 2021 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling | Feb 11, 2021 | Question AnsweringRetrieval | CodeCode Available | 1 |
| CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions | Dec 8, 2020 | counterfactualDescriptive | CodeCode Available | 1 |
| Just Ask: Learning to Answer Questions from Millions of Narrated Videos | Dec 1, 2020 | Question AnsweringQuestion Generation | CodeCode Available | 1 |
| Location-aware Graph Convolutional Networks for Video Question Answering | Aug 7, 2020 | Action Recognitiongraph construction | CodeCode Available | 1 |
| Visual Relation Grounding in Videos | Jul 17, 2020 | Question AnsweringRelation | CodeCode Available | 1 |
| Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions | Jul 17, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 1 |
| Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA | May 13, 2020 | Image CaptioningMulti-Label Classification | CodeCode Available | 1 |
| LifeQA: A Real-life Dataset for Video Question Answering | May 1, 2020 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | May 1, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Hierarchical Conditional Relation Networks for Video Question Answering | Feb 25, 2020 | Audio-Visual Question Answering (AVQA)Question Answering | CodeCode Available | 1 |
| How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | Jun 19, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| CogStream: Context-guided Streaming Video Question Answering | Jun 12, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning | Jun 9, 2025 | Future predictionQuestion Answering | CodeCode Available | 0 |
| EgoVLM: Policy Optimization for Egocentric Video Understanding | Jun 3, 2025 | EgoSchemaQuestion Answering | CodeCode Available | 0 |
| VUDG: A Dataset for Video Understanding Domain Generalization | May 30, 2025 | Domain GeneralizationMultiple-choice | —Unverified | 0 |
| Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering | May 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | May 29, 2025 | Question AnsweringVideo Generation | CodeCode Available | 0 |
| LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation | May 21, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 0 |