SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 101150 of 219 papers

TitleStatusHype
SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action SegmentationCode0
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition0
CASR: Refining Action Segmentation via Marginalizing Frame-levle Causal Relationships0
NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding0
Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion0
OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action SegmentationCode0
Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation0
LAC: Latent Action Composition for Skeleton-based Action Segmentation0
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation0
UnLoc: A Unified Framework for Video Localization TasksCode0
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
SF-TMN: SlowFast Temporal Modeling Network for Surgical Phase Recognition0
Enhancing Transformer Backbone for Egocentric Video Action Segmentation0
MED-VT++: Unifying Multimodal Learning with a Multiscale Encoder-Decoder Video Transformer0
Therbligs in Action: Video Understanding through Motion Primitives0
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation0
MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for Action Segmentation Using Sensor-Augmented KinematicsCode0
TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering0
Temporal Segment Transformer for Action Segmentation0
Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation0
LAC - Latent Action Composition for Skeleton-based Action Segmentation0
ASPnet: Action Segmentation With Shared-Private Representation of Multiple Data Sources0
Markov Game Video Augmentation for Action Segmentation0
Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos0
Video Action Segmentation via Contextually Refined Temporal Keypoints0
Timestamp-Supervised Action Segmentation from the Perspective of ClusteringCode0
C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation0
Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic Action Segmentation within Complex Human Assemblies0
Distill and Collect for Semi-Supervised Temporal Action Segmentation0
Robust Action Segmentation from Timestamp Supervision0
Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos0
A Circular Window-based Cascade Transformer for Online Action Detection0
An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation0
A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation0
Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation0
Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation0
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks0
Exploring Temporally Dynamic Data Augmentation for Video Recognition0
Surgical Phase Recognition in Laparoscopic Cholecystectomy0
Do we really need temporal convolutions in action segmentation?Code0
A Wireless-Vision Dataset for Privacy Preserving Human Activity Recognition0
Action parsing using context features0
Cross-Enhancement Transformer for Action SegmentationCode0
Maximization and restoration: Action segmentation through dilation passing and temporal reconstruction0
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural ActivitiesCode0
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos0
Continuous Human Action Recognition for Human-Machine Interaction: A Review0
Transformers in Action: Weakly Supervised Action Segmentation0
Fast and Unsupervised Action Boundary Detection for Action Segmentation0
Show:102550
← PrevPage 3 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified