SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 151200 of 219 papers

TitleStatusHype
Learning Discriminative Prototypes with Dynamic Time WarpingCode1
Temporal Action Segmentation from Timestamp SupervisionCode1
Depthwise Separable Temporal Convolutional Network for Action Segmentation0
Global2Local: Efficient Structure Search for Video Action SegmentationCode1
Refining Action Segmentation With Hierarchical Video RepresentationsCode1
Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces LearningCode0
Temporal Relational Modeling with Self-Supervision for Action SegmentationCode1
Alleviating Class-wise Gradient Imbalance for Pulmonary Airway SegmentationCode1
ActBERT: Learning Global-Local Video-Text RepresentationsCode0
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery0
Actor and Action Modular Network for Text-based Video Segmentation0
Online Spatiotemporal Action Detection and Prediction via Causal RepresentationsCode0
Boundary-Aware Cascade Networks for Temporal Action SegmentationCode1
Alleviating Over-segmentation Errors by Detecting Action BoundariesCode1
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action SegmentationCode1
Improving Action Segmentation via Graph-Based Temporal Reasoning0
Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos0
On Evaluating Weakly Supervised Action Segmentation Methods0
Hierarchical Attention Network for Action Segmentation0
Learning to Segment Actions from Observation and NarrationCode1
SCT: Set Constrained Temporal Transformer for Set Supervised Action SegmentationCode1
Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection0
Action Segmentation with Joint Self-Supervised Temporal Domain AdaptationCode1
Set-Constrained Viterbi for Set-Supervised Action Segmentation0
Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search0
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and GenerationCode1
Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences0
End-to-End Learning of Visual Representations from Uncurated Instructional VideosCode1
Human Action Sequence Classification0
Weakly Supervised Energy-Based Learning for Action SegmentationCode0
Coupled Generative Adversarial Network for Continuous Fine-grained Action Segmentation0
Fine-grained Action Segmentation using the Semi-Supervised Action GAN0
An Efficient 3D CNN for Action/Object Segmentation in Video0
Frontal Low-rank Random Tensors for Fine-grained Action SegmentationCode0
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation0
Representation Learning on Visual-Symbolic Graphs for Video Understanding0
Temporal Unet: Sample Level Human Action Recognition using WiFiCode0
Unsupervised learning of action classes with continuous temporal embeddingCode0
Fast Weakly Supervised Action Segmentation Using Mutual ConsistencyCode0
MS-TCN: Multi-Stage Temporal Convolutional Network for Action SegmentationCode0
Fine-Grained Semantic Segmentation of Motion Capture Data using Dilated Temporal Fully-Convolutional Networks0
Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional NetworksCode0
Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation0
Actor-Action Semantic Segmentation with Region Masks0
Dilated Temporal Fully-Convolutional Network for Semantic Segmentation of Motion Capture Data0
Deep Reinforcement Learning for Surgical Gesture Segmentation and ClassificationCode0
Temporal Deformable Residual Networks for Action Segmentation in Videos0
VideoCapsuleNet: A Simplified Network for Action Detection0
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning0
Weakly-Supervised Action Segmentation with Iterative Soft Boundary AssignmentCode0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified