| Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs | Oct 14, 2024 | Computational EfficiencyQuestion Answering | CodeCode Available | 2 | 5 |
| PruneVid: Visual Token Pruning for Efficient Video Large Language Models | Dec 20, 2024 | Video Understanding | CodeCode Available | 2 | 5 |
| Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency | Jun 2, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 | 5 |
| MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | Nov 28, 2023 | 3D Question Answering (3D-QA)Diagnostic | CodeCode Available | 2 | 5 |
| Multi-granularity Correspondence Learning from Long-term Noisy Videos | Jan 30, 2024 | Action SegmentationLong Video Retrieval (Background Removed) | CodeCode Available | 2 | 5 |
| Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs | Jun 13, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 | 5 |
| MMVU: Measuring Expert-Level Multi-Discipline Video Understanding | Jan 21, 2025 | Video Understanding | CodeCode Available | 2 | 5 |
| MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | Jul 31, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 2 | 5 |
| Neptune: The Long Orbit to Benchmarking Long Video Understanding | Dec 12, 2024 | BenchmarkingMultimodal Reasoning | CodeCode Available | 2 | 5 |
| LongVLM: Efficient Long Video Understanding via Large Language Models | Apr 4, 2024 | Question AnsweringVideo Question Answering | CodeCode Available | 2 | 5 |
| LVBench: An Extreme Long Video Understanding Benchmark | Jun 12, 2024 | Decision MakingVideo Understanding | CodeCode Available | 2 | 5 |
| LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos | Nov 29, 2024 | Boundary DetectionDense Video Captioning | CodeCode Available | 2 | 5 |
| Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model | Mar 27, 2025 | EgoSchemaLanguage Modeling | CodeCode Available | 2 | 5 |
| LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Jul 22, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 2 | 5 |
| Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models | Oct 4, 2024 | Dense Video CaptioningSentence | CodeCode Available | 2 | 5 |
| Dense Connector for MLLMs | May 22, 2024 | Video Understanding | CodeCode Available | 2 | 5 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 | 5 |
| Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | Nov 14, 2023 | Image-based Generative Performance BenchmarkingLanguage Modeling | CodeCode Available | 2 | 5 |
| InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Dec 18, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 2 | 5 |
| Omni-Video: Democratizing Unified Video Understanding and Generation | Jul 8, 2025 | Video GenerationVideo Understanding | CodeCode Available | 2 | 5 |
| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 | 5 |
| OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | Jun 24, 2024 | AI AgentLarge Language Model | CodeCode Available | 2 | 5 |
| Re-thinking Temporal Search for Long-Form Video Understanding | Apr 3, 2025 | Computational EfficiencyForm | CodeCode Available | 2 | 5 |
| Boosting Single Image Super-Resolution via Partial Channel Shifting | Jan 1, 2023 | DiversityImage Super-Resolution | CodeCode Available | 1 | 5 |
| Leveraging triplet loss for unsupervised action segmentation | Apr 13, 2023 | Action SegmentationClustering | CodeCode Available | 1 | 5 |