SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 150 of 219 papers

TitleStatusHype
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action SegmentationCode2
Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality AssessmentCode2
Multi-granularity Correspondence Learning from Long-term Noisy VideosCode2
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action SegmentationCode2
Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction? No, Let's Improve It With Action-union LearningCode2
Temporal Action Segmentation: An Analysis of Modern TechniquesCode2
EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language modelsCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
Hierarchical Vector Quantization for Unsupervised Action SegmentationCode1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action SegmentationCode1
ASQuery: A Query-based Model for Action SegmentationCode1
3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation ApproachCode1
Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit AdjustmentCode1
LOGO: A Long-Form Video Dataset for Group Action Quality AssessmentCode1
Progress-Aware Online Action Segmentation for Egocentric Procedural Task VideosCode1
A Decoupled Spatio-Temporal Framework for Skeleton-based Action SegmentationCode1
Activity Grammars for Temporal Action SegmentationCode1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce LearningCode1
How Much Temporal Long-Term Context is Needed for Action Segmentation?Code1
Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment AlignmentCode1
Pretrained Language Models as Visual Planners for Human AssistanceCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Diffusion Action SegmentationCode1
Streaming Video Temporal Action Segmentation In Real TimeCode1
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence TranslationCode1
RF-Next: Efficient Receptive Field Search for Convolutional Neural NetworksCode1
Temporal Alignment Networks for Long-term VideoCode1
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional VideosCode1
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object InteractionCode1
Skeleton-Based Action Segmentation with Multi-Stage Spatial-Temporal Graph Convolutional Neural NetworksCode1
Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order ConsistencyCode1
Iterative Contrast-Classify For Semi-supervised Temporal Action SegmentationCode1
Towards Tokenized Human Dynamics RepresentationCode1
Few-Shot Temporal Action Localization with Query Adaptive TransformerCode1
ASFormer: Transformer for Action SegmentationCode1
Unsupervised Action Segmentation by Joint Representation Learning and Online ClusteringCode1
Coarse to Fine Multi-Resolution Temporal Convolutional NetworkCode1
Efficient Two-Step Networks for Temporal Action SegmentationCode1
Automated freezing of gait assessment with marker-based motion capture and multi-stage spatial-temporal graph convolutional neural networksCode1
Temporally-Weighted Hierarchical Clustering for Unsupervised Action SegmentationCode1
Learning Discriminative Prototypes with Dynamic Time WarpingCode1
Temporal Action Segmentation from Timestamp SupervisionCode1
Global2Local: Efficient Structure Search for Video Action SegmentationCode1
Refining Action Segmentation With Hierarchical Video RepresentationsCode1
Temporal Relational Modeling with Self-Supervision for Action SegmentationCode1
Alleviating Class-wise Gradient Imbalance for Pulmonary Airway SegmentationCode1
Boundary-Aware Cascade Networks for Temporal Action SegmentationCode1
Alleviating Over-segmentation Errors by Detecting Action BoundariesCode1
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action SegmentationCode1
Learning to Segment Actions from Observation and NarrationCode1
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified