SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 151200 of 219 papers

TitleStatusHype
Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task VideosCode0
You Can Wash Hands Better: Accurate Daily Handwashing Assessment with a SmartwatchCode0
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional VideosCode0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text UnderstandingCode0
Long Short View Feature Decomposition via Contrastive Video Representation Learning0
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment0
Temporal Action Segmentation with High-level Complex Activity Labels0
FIFA: Fast Inference Approximation for Action Segmentation0
Unsupervised Action Segmentation for Instructional Videos0
SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation0
VLM: Task-agnostic Video-Language Model Pre-training for Video UnderstandingCode0
Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities0
Action in Mind: A Neural Network Approach to Action Recognition and Segmentation0
Action Segmentation with Mixed Temporal Domain Adaptation0
Action Shuffle Alternating Learning for Unsupervised Action Segmentation0
Anchor-Constrained Viterbi for Set-Supervised Action Segmentation0
Depthwise Separable Temporal Convolutional Network for Action Segmentation0
Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces LearningCode0
ActBERT: Learning Global-Local Video-Text RepresentationsCode0
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery0
Actor and Action Modular Network for Text-based Video Segmentation0
Online Spatiotemporal Action Detection and Prediction via Causal RepresentationsCode0
Improving Action Segmentation via Graph-Based Temporal Reasoning0
Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos0
On Evaluating Weakly Supervised Action Segmentation Methods0
Hierarchical Attention Network for Action Segmentation0
Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection0
Set-Constrained Viterbi for Set-Supervised Action Segmentation0
Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search0
Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences0
Human Action Sequence Classification0
Weakly Supervised Energy-Based Learning for Action SegmentationCode0
Fine-grained Action Segmentation using the Semi-Supervised Action GAN0
Coupled Generative Adversarial Network for Continuous Fine-grained Action Segmentation0
An Efficient 3D CNN for Action/Object Segmentation in Video0
Frontal Low-rank Random Tensors for Fine-grained Action SegmentationCode0
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation0
Representation Learning on Visual-Symbolic Graphs for Video Understanding0
Temporal Unet: Sample Level Human Action Recognition using WiFiCode0
Unsupervised learning of action classes with continuous temporal embeddingCode0
Fast Weakly Supervised Action Segmentation Using Mutual ConsistencyCode0
MS-TCN: Multi-Stage Temporal Convolutional Network for Action SegmentationCode0
Fine-Grained Semantic Segmentation of Motion Capture Data using Dilated Temporal Fully-Convolutional Networks0
Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional NetworksCode0
Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation0
Actor-Action Semantic Segmentation with Region Masks0
Dilated Temporal Fully-Convolutional Network for Semantic Segmentation of Motion Capture Data0
Deep Reinforcement Learning for Surgical Gesture Segmentation and ClassificationCode0
Temporal Deformable Residual Networks for Action Segmentation in Videos0
VideoCapsuleNet: A Simplified Network for Action Detection0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified