| VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning | Mar 17, 2025 | Grounded Video Question AnsweringQuestion Answering | CodeCode Available | 3 | 5 |
| Perception Test: A Diagnostic Benchmark for Multimodal Video Models | May 23, 2023 | DiagnosticGrounded Video Question Answering | CodeCode Available | 2 | 5 |
| Can I Trust Your Answer? Visually Grounded Video Question Answering | Sep 4, 2023 | Grounded Video Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format | Nov 27, 2024 | Dense Video CaptioningGrounded Video Question Answering | CodeCode Available | 1 | 5 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 | 0 |
| The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA | Jul 2, 2024 | Grounded Video Question AnsweringObject Tracking | —Unverified | 0 | 0 |
| Perception Test 2023: A Summary of the First Challenge And Outcome | Dec 20, 2023 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 | 0 |