| LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning | Jun 26, 2025 | Action UnderstandingInstruction Following | CodeCode Available | 0 |
| The Role of Video Generation in Enhancing Data-Limited Action Understanding | May 26, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition | Apr 17, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos | Apr 11, 2025 | Action UnderstandingEvent Detection | CodeCode Available | 1 |
| RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics | Apr 2, 2025 | Action UnderstandingRepresentation Learning | —Unverified | 0 |
| Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery | Mar 29, 2025 | Action UnderstandingInstrument Recognition | —Unverified | 0 |
| ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Mar 26, 2025 | Action Understanding | —Unverified | 0 |
| LLaVAction: evaluating and training multi-modal large language models for action recognition | Mar 24, 2025 | Action RecognitionAction Understanding | CodeCode Available | 2 |
| HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Feb 28, 2025 | Action UnderstandingText-to-Video Generation | —Unverified | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 |