| OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding | Jun 11, 2024 | Action UnderstandingDiversity | CodeCode Available | 2 |
| LLaVAction: evaluating and training multi-modal large language models for action recognition | Mar 24, 2025 | Action RecognitionAction Understanding | CodeCode Available | 2 |
| Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos | Mar 26, 2022 | Action SegmentationAction Understanding | CodeCode Available | 1 |
| Memory-and-Anticipation Transformer for Online Action Understanding | Aug 15, 2023 | Action DetectionAction Understanding | CodeCode Available | 1 |
| Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions | Jul 24, 2022 | Action DetectionAction Understanding | CodeCode Available | 1 |
| Action Quality Assessment with Temporal Parsing Transformer | Jul 19, 2022 | Action Quality AssessmentAction Understanding | CodeCode Available | 1 |
| YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos | Apr 12, 2020 | Action UnderstandingQuestion Answering | CodeCode Available | 1 |
| Towards Tokenized Human Dynamics Representation | Nov 22, 2021 | Action SegmentationAction Understanding | CodeCode Available | 1 |
| Open-Vocabulary Video Relation Extraction | Dec 25, 2023 | Action ClassificationAction Understanding | CodeCode Available | 1 |
| Paxion: Patching Action Knowledge in Video-Language Foundation Models | May 18, 2023 | Action UnderstandingDiagnostic | CodeCode Available | 1 |
| Home Action Genome: Cooperative Compositional Action Understanding | May 11, 2021 | Action RecognitionAction Understanding | CodeCode Available | 1 |
| PIANO: A Parametric Hand Bone Model from Magnetic Resonance Imaging | Jun 21, 2021 | Action Understanding | CodeCode Available | 1 |
| F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos | Apr 11, 2025 | Action UnderstandingEvent Detection | CodeCode Available | 1 |
| Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition | Sep 3, 2021 | Action RecognitionAction Understanding | CodeCode Available | 1 |
| Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning | Aug 8, 2023 | Action UnderstandingContrastive Learning | CodeCode Available | 1 |
| Detailed 2D-3D Joint Representation for Human-Object Interaction | Apr 17, 2020 | Action UnderstandingHuman-Object Interaction Detection | CodeCode Available | 1 |
| FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment | May 11, 2024 | Action Quality AssessmentAction Understanding | CodeCode Available | 1 |
| FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding | Jan 1, 2024 | Action AnalysisAction Understanding | CodeCode Available | 1 |
| Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment | Feb 28, 2022 | 3D Action RecognitionAction Analysis | CodeCode Available | 1 |
| SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization | Jan 2, 2025 | Action RecognitionAction Understanding | CodeCode Available | 1 |
| EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding | Jun 13, 2024 | Action ClassificationAction Localization | CodeCode Available | 1 |
| Temporal Relational Modeling with Self-Supervision for Action Segmentation | Dec 14, 2020 | Action RecognitionAction Segmentation | CodeCode Available | 1 |
| Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation | Oct 31, 2024 | Action SegmentationAction Understanding | CodeCode Available | 1 |
| Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports | Jan 3, 2024 | Action Understandingcounterfactual | CodeCode Available | 1 |
| LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities | Jul 31, 2020 | Action RecognitionAction Understanding | CodeCode Available | 1 |
| Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding | Nov 6, 2023 | Action UnderstandingRepresentation Learning | CodeCode Available | 1 |
| Human Action Segmentation With Hierarchical Supervoxel Consistency | Jun 1, 2015 | Action ClassificationAction Segmentation | —Unverified | 0 |
| HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | Jan 25, 2025 | Action UnderstandingEmotion Recognition | —Unverified | 0 |
| Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study | Jan 17, 2024 | Action UnderstandingLanguage Modeling | —Unverified | 0 |
| Intra- and Inter-Action Understanding via Temporal Action Parsing | May 20, 2020 | Action ParsingAction Recognition | —Unverified | 0 |
| Invisible-to-Visible: Privacy-Aware Human Instance Segmentation using Airborne Ultrasound via Collaborative Learning Variational Autoencoder | Apr 15, 2022 | Action RecognitionAction Understanding | —Unverified | 0 |
| JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection | Jun 16, 2021 | Action DetectionAction Understanding | —Unverified | 0 |
| Kantian Deontology Meets AI Alignment: Towards Morally Grounded Fairness Metrics | Nov 9, 2023 | Action UnderstandingEthics | —Unverified | 0 |
| MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion | Sep 16, 2024 | Action UnderstandingContrastive Learning | —Unverified | 0 |
| MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding | Oct 1, 2019 | Action RecognitionAction Understanding | —Unverified | 0 |
| mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors | Oct 15, 2022 | 3D Human Pose EstimationAction Detection | —Unverified | 0 |
| Multitask Learning in Minimally Invasive Surgical Vision: A Review | Jan 16, 2024 | Action Understanding | —Unverified | 0 |
| PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition | Apr 17, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding | Mar 22, 2017 | Action DetectionAction Recognition | —Unverified | 0 |
| Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models | Jul 22, 2024 | Action UnderstandingActivity Recognition | —Unverified | 0 |
| Region-aware Image-based Human Action Retrieval with Transformers | Jul 13, 2024 | Action RecognitionAction Understanding | —Unverified | 0 |
| RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics | Apr 2, 2025 | Action UnderstandingRepresentation Learning | —Unverified | 0 |
| Scene Understanding for Autonomous Manipulation with Deep Learning | Mar 23, 2019 | Action UnderstandingAffordance Detection | —Unverified | 0 |
| ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction | Mar 26, 2025 | Action Understanding | —Unverified | 0 |
| Self-supervised Discovery of Human Actons from Long Kinematic Videos | Sep 29, 2021 | Action UnderstandingSentence | —Unverified | 0 |
| Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning | Apr 8, 2024 | Action UnderstandingDecoder | —Unverified | 0 |
| STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding | Jan 1, 2025 | Action UnderstandingSpatio-Temporal Video Grounding | —Unverified | 0 |
| The SkatingVerse Workshop & Challenge: Methods and Results | May 27, 2024 | Action Understanding | —Unverified | 0 |
| Action Understanding with Multiple Classes of Actors | Apr 27, 2017 | Action RecognitionAction Segmentation | —Unverified | 0 |
| Actor and Action Modular Network for Text-based Video Segmentation | Nov 2, 2020 | Action SegmentationAction Understanding | —Unverified | 0 |