| LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning | Jun 26, 2025 | Action UnderstandingInstruction Following | CodeCode Available | 0 |
| The Role of Video Generation in Enhancing Data-Limited Action Understanding | May 26, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition | Apr 17, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos | Apr 11, 2025 | Action UnderstandingEvent Detection | CodeCode Available | 1 |
| RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics | Apr 2, 2025 | Action UnderstandingRepresentation Learning | —Unverified | 0 |
| Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery | Mar 29, 2025 | Action UnderstandingInstrument Recognition | —Unverified | 0 |
| ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Mar 26, 2025 | Action Understanding | —Unverified | 0 |
| LLaVAction: evaluating and training multi-modal large language models for action recognition | Mar 24, 2025 | Action RecognitionAction Understanding | CodeCode Available | 2 |
| HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Feb 28, 2025 | Action UnderstandingText-to-Video Generation | —Unverified | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 |
| SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization | Jan 2, 2025 | Action RecognitionAction Understanding | CodeCode Available | 1 |
| STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding | Jan 1, 2025 | Action UnderstandingSpatio-Temporal Video Grounding | —Unverified | 0 |
| Heterogeneous Skeleton-Based Action Representation Learning | Jan 1, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| About Time: Advances, Challenges, and Outlooks of Action Understanding | Nov 22, 2024 | Action UnderstandingSurvey | —Unverified | 0 |
| Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation | Oct 31, 2024 | Action SegmentationAction Understanding | CodeCode Available | 1 |
| MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion | Sep 16, 2024 | Action UnderstandingContrastive Learning | —Unverified | 0 |
| CathAction: A Benchmark for Endovascular Intervention Understanding | Aug 23, 2024 | Action Understanding | —Unverified | 0 |
| Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models | Jul 22, 2024 | Action UnderstandingActivity Recognition | —Unverified | 0 |
| Region-aware Image-based Human Action Retrieval with Transformers | Jul 13, 2024 | Action RecognitionAction Understanding | —Unverified | 0 |
| VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment | Jun 16, 2024 | Action UnderstandingBenchmarking | —Unverified | 0 |
| EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding | Jun 13, 2024 | Action ClassificationAction Localization | CodeCode Available | 1 |
| OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding | Jun 11, 2024 | Action UnderstandingDiversity | CodeCode Available | 2 |
| Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond | Jun 5, 2024 | Action RecognitionAction Understanding | CodeCode Available | 0 |
| The SkatingVerse Workshop & Challenge: Methods and Results | May 27, 2024 | Action Understanding | —Unverified | 0 |
| FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment | May 11, 2024 | Action Quality AssessmentAction Understanding | CodeCode Available | 1 |