| ViDAS: Vision-based Danger Assessment and Scoring | Oct 1, 2024 | Fixed Few Shot PromptingFixed Few Shot Prompting Danger Assessment | —Unverified | 0 | 0 |
| Video-Bench: Human-Aligned Video Generation Benchmark | Jan 1, 2025 | Large Language ModelVideo Generation | —Unverified | 0 | 0 |
| Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model | Aug 21, 2024 | Emotion RecognitionLanguage Modeling | —Unverified | 0 | 0 |
| Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | Jul 8, 2025 | Future predictionLarge Language Model | —Unverified | 0 | 0 |
| VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Nov 7, 2024 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Jan 1, 2025 | Large Language ModelVideo Segmentation | —Unverified | 0 | 0 |
| VideoLLM-online: Online Video Large Language Model for Streaming Video | Jun 17, 2024 | GPULanguage Modeling | —Unverified | 0 | 0 |
| Video LLMs for Temporal Reasoning in Long Videos | Dec 4, 2024 | Action SegmentationDense Video Captioning | —Unverified | 0 | 0 |
| Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition | May 7, 2024 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 | 0 |
| VideoOrion: Tokenizing Object Dynamics in Videos | Nov 25, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |