SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 150 of 219 papers

TitleStatusHype
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action SegmentationCode2
Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality AssessmentCode2
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action SegmentationCode2
Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction? No, Let's Improve It With Action-union LearningCode2
Multi-granularity Correspondence Learning from Long-term Noisy VideosCode2
Temporal Action Segmentation: An Analysis of Modern TechniquesCode2
Temporal Alignment Networks for Long-term VideoCode1
RF-Next: Efficient Receptive Field Search for Convolutional Neural NetworksCode1
Temporal Convolutional Networks for Action Segmentation and DetectionCode1
LOGO: A Long-Form Video Dataset for Group Action Quality AssessmentCode1
Pretrained Language Models as Visual Planners for Human AssistanceCode1
Refining Action Segmentation With Hierarchical Video RepresentationsCode1
Skeleton-Based Action Segmentation with Multi-Stage Spatial-Temporal Graph Convolutional Neural NetworksCode1
Temporal Action Segmentation from Timestamp SupervisionCode1
Iterative Contrast-Classify For Semi-supervised Temporal Action SegmentationCode1
Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit AdjustmentCode1
Global2Local: Efficient Structure Search for Video Action SegmentationCode1
Automated freezing of gait assessment with marker-based motion capture and multi-stage spatial-temporal graph convolutional neural networksCode1
How Much Temporal Long-Term Context is Needed for Action Segmentation?Code1
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional VideosCode1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action SegmentationCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Diffusion Action SegmentationCode1
Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment AlignmentCode1
Alleviating Over-segmentation Errors by Detecting Action BoundariesCode1
Progress-Aware Online Action Segmentation for Egocentric Procedural Task VideosCode1
SCT: Set Constrained Temporal Transformer for Set Supervised Action SegmentationCode1
Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order ConsistencyCode1
Streaming Video Temporal Action Segmentation In Real TimeCode1
EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language modelsCode1
3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation ApproachCode1
ASQuery: A Query-based Model for Action SegmentationCode1
End-to-End Learning of Visual Representations from Uncurated Instructional VideosCode1
Activity Grammars for Temporal Action SegmentationCode1
ASFormer: Transformer for Action SegmentationCode1
Efficient Two-Step Networks for Temporal Action SegmentationCode1
Actor and Action Video Segmentation from a SentenceCode1
Few-Shot Temporal Action Localization with Query Adaptive TransformerCode1
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object InteractionCode1
Boundary-Aware Cascade Networks for Temporal Action SegmentationCode1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce LearningCode1
A Decoupled Spatio-Temporal Framework for Skeleton-based Action SegmentationCode1
Action Segmentation with Joint Self-Supervised Temporal Domain AdaptationCode1
Coarse to Fine Multi-Resolution Temporal Convolutional NetworkCode1
Learning Discriminative Prototypes with Dynamic Time WarpingCode1
Learning to Segment Actions from Observation and NarrationCode1
Hierarchical Vector Quantization for Unsupervised Action SegmentationCode1
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action SegmentationCode1
Alleviating Class-wise Gradient Imbalance for Pulmonary Airway SegmentationCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified