SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 51100 of 219 papers

TitleStatusHype
Skeleton-Based Action Segmentation with Multi-Stage Spatial-Temporal Graph Convolutional Neural NetworksCode1
Automated freezing of gait assessment with marker-based motion capture and multi-stage spatial-temporal graph convolutional neural networksCode1
Progress-Aware Online Action Segmentation for Egocentric Procedural Task VideosCode1
Diffusion Action SegmentationCode1
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and GenerationCode1
Unsupervised Action Segmentation by Joint Representation Learning and Online ClusteringCode1
Fast Weakly Supervised Action Segmentation Using Mutual ConsistencyCode0
Efficient Temporal Action Segmentation via Boundary-aware Query VotingCode0
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary AlignmentCode0
Weakly-Supervised Action Segmentation and Alignment via Transcript-Aware Union-of-Subspaces LearningCode0
Weakly-Supervised Action Segmentation with Iterative Soft Boundary AssignmentCode0
Do we really need temporal convolutions in action segmentation?Code0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text UnderstandingCode0
Unsupervised learning of action classes with continuous temporal embeddingCode0
VLM: Task-agnostic Video-Language Model Pre-training for Video UnderstandingCode0
UnLoc: A Unified Framework for Video Localization TasksCode0
Weakly Supervised Action Learning with RNN based Fine-to-coarse ModelingCode0
Weakly Supervised Energy-Based Learning for Action SegmentationCode0
A Multimodal Handover Failure Detection Dataset and BaselinesCode0
Deep Reinforcement Learning for Surgical Gesture Segmentation and ClassificationCode0
Action Sets: Weakly Supervised Action Segmentation without Ordering ConstraintsCode0
Transformer with Controlled Attention for Synchronous Motion CaptioningCode0
Cross-Enhancement Transformer for Action SegmentationCode0
Cost-Sensitive Learning for Long-Tailed Temporal Action SegmentationCode0
Temporal Human Action Segmentation via Dynamic ClusteringCode0
ActBERT: Learning Global-Local Video-Text RepresentationsCode0
Timestamp-Supervised Action Segmentation from the Perspective of ClusteringCode0
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge TransferCode0
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video PairsCode0
SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action SegmentationCode0
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action SegmentationCode0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task VideosCode0
SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action SegmentationCode0
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional VideosCode0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge UnderstandingCode0
ProMQA: Question Answering Dataset for Multimodal Procedural Activity UnderstandingCode0
Temporal Unet: Sample Level Human Action Recognition using WiFiCode0
Temporal Convolutional Networks: A Unified Approach to Action SegmentationCode0
Online Spatiotemporal Action Detection and Prediction via Causal RepresentationsCode0
OnlineTAS: An Online Baseline for Temporal Action SegmentationCode0
OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action SegmentationCode0
A study of animal action segmentation algorithms across supervised, unsupervised, and semi-supervised learning paradigmsCode0
Frontal Low-rank Random Tensors for Fine-grained Action SegmentationCode0
MS-TCN: Multi-Stage Temporal Convolutional Network for Action SegmentationCode0
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural ActivitiesCode0
Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action SegmentationCode0
MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for Action Segmentation Using Sensor-Augmented KinematicsCode0
Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional NetworksCode0
You Can Wash Hands Better: Accurate Daily Handwashing Assessment with a SmartwatchCode0
Show:102550
← PrevPage 2 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified