| QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | May 22, 2025 | CPUGPU | CodeCode Available | 2 | 5 |
| PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Nov 4, 2024 | Caption GenerationMultiple-choice | CodeCode Available | 2 | 5 |
| QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension | Mar 11, 2025 | AutoMLDecoder | CodeCode Available | 2 | 5 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 | 5 |
| ST-LLM: Large Language Models Are Effective Temporal Learners | Mar 30, 2024 | MVBenchReading Comprehension | CodeCode Available | 2 | 5 |
| TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding | Jan 26, 2025 | Video Understanding | CodeCode Available | 2 | 5 |
| OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? | Jan 9, 2025 | BenchmarkingVideo Understanding | CodeCode Available | 2 | 5 |
| Attention Mechanisms in Computer Vision: A Survey | Nov 15, 2021 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | Mar 31, 2025 | Logical ReasoningMultiple-choice | CodeCode Available | 2 | 5 |
| E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Sep 26, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 2 | 5 |
| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 | 5 |
| OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | Jun 24, 2024 | AI AgentLarge Language Model | CodeCode Available | 2 | 5 |
| OmniVid: A Generative Framework for Universal Video Understanding | Mar 26, 2024 | Action RecognitionDecoder | CodeCode Available | 2 | 5 |
| Online Video Understanding: OVBench and VideoChat-Online | Dec 31, 2024 | Autonomous DrivingQuestion Answering | CodeCode Available | 2 | 5 |
| Omni-Video: Democratizing Unified Video Understanding and Generation | Jul 8, 2025 | Video GenerationVideo Understanding | CodeCode Available | 2 | 5 |
| Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | Nov 14, 2023 | Image-based Generative Performance BenchmarkingLanguage Modeling | CodeCode Available | 2 | 5 |
| ActionFormer: Localizing Moments of Actions with Transformers | Feb 16, 2022 | Action LocalizationAction Recognition | CodeCode Available | 2 | 5 |
| MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | Nov 28, 2023 | 3D Question Answering (3D-QA)Diagnostic | CodeCode Available | 2 | 5 |
| A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future | Jul 18, 2023 | Knowledge Distillationobject-detection | CodeCode Available | 2 | 5 |
| Multi-granularity Correspondence Learning from Long-term Noisy Videos | Jan 30, 2024 | Action SegmentationLong Video Retrieval (Background Removed) | CodeCode Available | 2 | 5 |
| PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Dec 20, 2024 | Video Understanding | CodeCode Available | 2 | 5 |
| PyTorchVideo: A Deep Learning Library for Video Understanding | Nov 18, 2021 | Deep LearningSelf-Supervised Learning | CodeCode Available | 2 | 5 |
| Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Mar 24, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| AIM: Adapting Image Models for Efficient Video Action Recognition | Feb 6, 2023 | Action ClassificationAction Recognition | CodeCode Available | 2 | 5 |
| Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs | Jun 13, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 | 5 |