Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 219 papers

Title	Date	Tasks	Status	Hype	Score
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering	May 27, 2021	Action SegmentationClustering	CodeCode Available	1	5
Pretrained Language Models as Visual Planners for Human Assistance	Apr 17, 2023	Action SegmentationLanguage Modelling	CodeCode Available	1	5
3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach	Aug 29, 2024	Action SegmentationMarkerless Motion Capture	CodeCode Available	1	5
Diffusion Action Segmentation	Mar 31, 2023	Action SegmentationDenoising	CodeCode Available	1	5
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks	Jun 14, 2022	Action SegmentationInstance Segmentation	CodeCode Available	1	5
Refining Action Segmentation With Hierarchical Video Representations	Jan 1, 2021	Action SegmentationSegmentation	CodeCode Available	1	5
SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation	Dec 19, 2023	Action SegmentationContrastive Learning	CodeCode Available	0	5
Efficient Temporal Action Segmentation via Boundary-aware Query Voting	May 25, 2024	Action SegmentationInstance Segmentation	CodeCode Available	0	5
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment	Mar 28, 2024	Action SegmentationSegmentation	CodeCode Available	0	5
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation	May 6, 2024	Action SegmentationSkeleton Based Action Segmentation	CodeCode Available	0	5
Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos	Jan 1, 2022	Action SegmentationWeakly-supervised Learning	CodeCode Available	0	5
Do we really need temporal convolutions in action segmentation?	May 26, 2022	Action ClassificationAction Segmentation	CodeCode Available	0	5
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding	Oct 29, 2024	Action RecognitionAction Segmentation	CodeCode Available	0	5
SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action Segmentation	Nov 29, 2023	Action SegmentationOptical Flow Estimation	CodeCode Available	0	5
A Multimodal Handover Failure Detection Dataset and Baselines	Feb 28, 2024	Action SegmentationObject	CodeCode Available	0	5
Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks	Feb 14, 2019	Action Segmentation	CodeCode Available	0	5
Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification	Jun 21, 2018	Action SegmentationClassification	CodeCode Available	0	5
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints	Jun 2, 2017	Action DetectionAction Segmentation	CodeCode Available	0	5
Cross-Enhancement Transformer for Action Segmentation	May 19, 2022	Action SegmentationDecoder	CodeCode Available	0	5
OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation	Sep 12, 2023	Action SegmentationBoundary Detection	CodeCode Available	0	5
Cost-Sensitive Learning for Long-Tailed Temporal Action Segmentation	Mar 24, 2025	Action SegmentationSegmentation	CodeCode Available	0	5
ActBERT: Learning Global-Local Video-Text Representations	Nov 14, 2020	Action SegmentationQuestion Answering	CodeCode Available	0	5
OnlineTAS: An Online Baseline for Temporal Action Segmentation	Nov 2, 2024	Action SegmentationSegmentation	CodeCode Available	0	5
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs	Dec 5, 2023	Action SegmentationAll	CodeCode Available	0	5
Frontal Low-rank Random Tensors for Fine-grained Action Segmentation	Jun 3, 2019	Action ParsingAction Segmentation	CodeCode Available	0	5

Show:10 25 50

← PrevPage 3 of 9Next →

All datasets Breakfast 50 Salads GTEA COIN Assembly101 JIGSAWS Youtube INRIA Instructional 50Salads MPII Cooking 2 Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AdaFocus (newly extracted I3D-features, LT-Context model)	Average F1	76.2	—	Unverified
2	FACT (efficient hybrid of convolution and transformer model)	Average F1	74.7	—	Unverified
3	ASQuery	Average F1	74.6	—	Unverified
4	BIT	Average F1	73.7	—	Unverified
5	DiffAct	Average F1	73.6	—	Unverified
6	BaFormer	Average F1	72.4	—	Unverified
7	CETNet	Average F1	71.8	—	Unverified
8	SF-TMN(ASFormer)	Average F1	71.6	—	Unverified
9	RF++-SSTDA	Acc	70.8	—	Unverified
10	ASPnet	Average F1	70.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Br-Prompt+ASPnet (RGB, flow, accelerometer)	F1@50%	88.5	—	Unverified
2	Semantic2Graph	F1@50%	87.3	—	Unverified
3	BaFormer	F1@50%	83.9	—	Unverified
4	DiffAct	F1@50%	83.7	—	Unverified
5	SF-TMN(ASFormer)	F1@50%	82.9	—	Unverified
6	LTContext	F1@50%	82	—	Unverified
7	UVAST	F1@50%	81.7	—	Unverified
8	Br-Prompt+ASFormer	F1@50%	81.3	—	Unverified
9	EUT	F1@50%	81	—	Unverified
10	CETNet	F1@50%	80.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Semantic2Graph	F1@50%	91.3	—	Unverified
2	FACT	F1@50%	87.5	—	Unverified
3	DiffAct	F1@50%	84.7	—	Unverified
4	BaFormer	F1@50%	83.5	—	Unverified
5	SF-TMN(ASFormer)	F1@50%	83.1	—	Unverified
6	Br-Prompt+ASFormer	F1@50%	83	—	Unverified
7	DPRN	F1@50%	82.9	—	Unverified
8	BIT	F1@50%	82.6	—	Unverified
9	CETNet	F1@50%	81.3	—	Unverified
10	UVAST	F1@50%	81	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	Frame accuracy	72.8	—	Unverified
2	Univl	Frame accuracy	70	—	Unverified
3	Norton	Frame accuracy	69.8	—	Unverified
4	VideoClip	Frame accuracy	68.7	—	Unverified
5	TACo	Frame accuracy	68.4	—	Unverified
6	VLM	Frame accuracy	68.4	—	Unverified
7	MIL-NCE	Frame accuracy	61	—	Unverified
8	ActBERT	Frame accuracy	57	—	Unverified
9	CBT	Frame accuracy	53.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASQuery	F1@10%	37.8	—	Unverified
2	LTContext	F1@10%	33.9	—	Unverified
3	ASFormer	F1@10%	33.4	—	Unverified
4	C2F-TCN	F1@10%	33.3	—	Unverified
5	UVAST	F1@10%	32.1	—	Unverified
6	MS-TCN++	F1@10%	31.6	—	Unverified
7	ProTAS(Offline)	F1@10%	28.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RL+Tree	Edit Distance	88.53	—	Unverified
2	RL (full)	Edit Distance	87.96	—	Unverified
3	TricorNet	Edit Distance	86.8	—	Unverified
4	SDL+SC-CRF	Edit Distance	86.21	—	Unverified
5	TCN	Edit Distance	83.1	—	Unverified
6	ST-CNN+Seg	Edit Distance	66.56	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TSA (FINCH)	Acc	62.4	—	Unverified
2	TSA (Kmeans)	Acc	59.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EUT	Acc	87.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Unsup. TW-FINCH (K=avg/activity)	Accuracy	42	—	Unverified