SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 151200 of 219 papers

TitleStatusHype
SFGANS Self-supervised Future Generator for human ActioN Segmentation0
SF-TMN: SlowFast Temporal Modeling Network for Surgical Phase Recognition0
VideoCapsuleNet: A Simplified Network for Action Detection0
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition0
Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos0
Video LLMs for Temporal Reasoning in Long Videos0
Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation0
Stitch Contrast and Segment_Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos0
Actor and Action Modular Network for Text-based Video Segmentation0
Surgical Phase Recognition in Laparoscopic Cholecystectomy0
ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis0
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment0
TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering0
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks0
Actor-Action Semantic Segmentation with Region Masks0
Action Understanding with Multiple Classes of Actors0
Temporal Action Segmentation with High-level Complex Activity Labels0
Action Shuffle Alternating Learning for Unsupervised Action Segmentation0
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints0
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning0
Action Segmentation with Mixed Temporal Domain Adaptation0
Temporal Deformable Residual Networks for Action Segmentation in Videos0
Watch-Bot: Unsupervised Learning for Reminding Humans of Forgotten Actions0
Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion0
Action parsing using context features0
Action in Mind: A Neural Network Approach to Action Recognition and Segmentation0
Temporal Segment Transformer for Action Segmentation0
Therbligs in Action: Video Understanding through Motion Primitives0
TimeLogic: A Temporal Logic Benchmark for Video QA0
Watch-n-Patch: Unsupervised Learning of Actions and Relations0
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks0
Towards Generalizing Temporal Action Segmentation to Unseen Views0
ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation0
Transformers in Action: Weakly Supervised Action Segmentation0
Watch-n-Patch: Unsupervised Understanding of Actions and Relations0
TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation0
Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation0
Enhancing Transformer Backbone for Egocentric Video Action Segmentation0
Error Detection in Egocentric Procedural Task Videos0
Exploring Temporally Dynamic Data Augmentation for Video Recognition0
End-to-End Fine-Grained Action Segmentation and Recognition Using Conditional Random Field Models and Discriminative Sparse Coding0
Fast and Unsupervised Action Boundary Detection for Action Segmentation0
End-to-End Action Segmentation Transformer0
FIFA: Fast Inference Approximation for Action Segmentation0
Fine-grained Action Segmentation using the Semi-Supervised Action GAN0
Fine-Grained Semantic Segmentation of Motion Capture Data using Dilated Temporal Fully-Convolutional Networks0
Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition0
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation0
Grasp Type Revisited: A Modern Perspective on a Classical Feature for Vision0
Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic Action Segmentation within Complex Human Assemblies0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified
#ModelMetricClaimedVerifiedStatus
1Br-Prompt+ASPnet (RGB, flow, accelerometer)F1@50%88.5Unverified
2Semantic2GraphF1@50%87.3Unverified
3BaFormerF1@50%83.9Unverified
4DiffActF1@50%83.7Unverified
5SF-TMN(ASFormer)F1@50%82.9Unverified
6LTContextF1@50%82Unverified
7UVASTF1@50%81.7Unverified
8Br-Prompt+ASFormerF1@50%81.3Unverified
9EUTF1@50%81Unverified
10CETNetF1@50%80.1Unverified
#ModelMetricClaimedVerifiedStatus
1Semantic2GraphF1@50%91.3Unverified
2FACTF1@50%87.5Unverified
3DiffActF1@50%84.7Unverified
4BaFormerF1@50%83.5Unverified
5SF-TMN(ASFormer)F1@50%83.1Unverified
6Br-Prompt+ASFormerF1@50%83Unverified
7DPRNF1@50%82.9Unverified
8BITF1@50%82.6Unverified
9CETNetF1@50%81.3Unverified
10UVASTF1@50%81Unverified
#ModelMetricClaimedVerifiedStatus
1UnLoc-LFrame accuracy72.8Unverified
2UnivlFrame accuracy70Unverified
3NortonFrame accuracy69.8Unverified
4VideoClipFrame accuracy68.7Unverified
5TACoFrame accuracy68.4Unverified
6VLMFrame accuracy68.4Unverified
7MIL-NCEFrame accuracy61Unverified
8ActBERTFrame accuracy57Unverified
9CBTFrame accuracy53.9Unverified
#ModelMetricClaimedVerifiedStatus
1ASQueryF1@10%37.8Unverified
2LTContextF1@10%33.9Unverified
3ASFormerF1@10%33.4Unverified
4C2F-TCNF1@10%33.3Unverified
5UVASTF1@10%32.1Unverified
6MS-TCN++F1@10%31.6Unverified
7ProTAS(Offline)F1@10%28.7Unverified
#ModelMetricClaimedVerifiedStatus
1RL+TreeEdit Distance88.53Unverified
2RL (full)Edit Distance87.96Unverified
3TricorNetEdit Distance86.8Unverified
4SDL+SC-CRFEdit Distance86.21Unverified
5TCNEdit Distance83.1Unverified
6ST-CNN+SegEdit Distance66.56Unverified
#ModelMetricClaimedVerifiedStatus
1TSA (FINCH)Acc62.4Unverified
2TSA (Kmeans)Acc59.7Unverified
#ModelMetricClaimedVerifiedStatus
1EUTAcc87.4Unverified
#ModelMetricClaimedVerifiedStatus
1Unsup. TW-FINCH (K=avg/activity)Accuracy42Unverified