SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 51100 of 219 papers

TitleStatusHype
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence TranslationCode1
LOGO: A Long-Form Video Dataset for Group Action Quality AssessmentCode1
Automated freezing of gait assessment with marker-based motion capture and multi-stage spatial-temporal graph convolutional neural networksCode1
Diffusion Action SegmentationCode1
Pretrained Language Models as Visual Planners for Human AssistanceCode1
3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation ApproachCode1
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation0
Action Understanding with Multiple Classes of Actors0
An Efficient Framework for Few-shot Skeleton-based Temporal Action Segmentation0
ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation0
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation0
An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos0
Distill and Collect for Semi-Supervised Temporal Action Segmentation0
An Efficient 3D CNN for Action/Object Segmentation in Video0
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation0
Dilated Temporal Fully-Convolutional Network for Semantic Segmentation of Motion Capture Data0
Anchor-Constrained Viterbi for Set-Supervised Action Segmentation0
Action Shuffle Alternating Learning for Unsupervised Action Segmentation0
MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models0
Depthwise Separable Temporal Convolutional Network for Action Segmentation0
2by2: Weakly-Supervised Learning for Global Action Segmentation0
Depth Over RGB: Automatic Evaluation of Open Surgery Skills Using Depth Camera0
Long Short View Feature Decomposition via Contrastive Video Representation Learning0
Coupled Generative Adversarial Network for Continuous Fine-grained Action Segmentation0
LAC: Latent Action Composition for Skeleton-based Action Segmentation0
Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition0
Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences0
Action Segmentation with Mixed Temporal Domain Adaptation0
Continuous Human Action Recognition for Human-Machine Interaction: A Review0
Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation0
Markov Game Video Augmentation for Action Segmentation0
Condensing Action Segmentation Datasets via Generative Network Inversion0
Coherent Temporal Synthesis for Incremental Action Segmentation0
A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation0
CASR: Refining Action Segmentation via Marginalizing Frame-levle Causal Relationships0
HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild0
HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild0
ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living0
Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection0
C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation0
Human Action Segmentation With Hierarchical Supervoxel Consistency0
Human Action Sequence Classification0
Improving action segmentation via explicit similarity measurement0
Improving Action Segmentation via Graph-Based Temporal Reasoning0
A Circular Window-based Cascade Transformer for Online Action Detection0
Hierarchical Attention Network for Action Segmentation0
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition0
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation0
Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic Action Segmentation within Complex Human Assemblies0
Grasp Type Revisited: A Modern Perspective on a Classical Feature for Vision0
Show:102550
← PrevPage 2 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified