SOTAVerified

Action Segmentation

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Papers

Showing 110 of 219 papers

TitleStatusHype
Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language modelsCode1
M2R2: MulitModal Robotic Representation for Temporal Action Segmentation0
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos0
Towards Generalizing Temporal Action Segmentation to Unseen Views0
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning0
Cost-Sensitive Learning for Long-Tailed Temporal Action SegmentationCode0
Condensing Action Segmentation Datasets via Generative Network Inversion0
End-to-End Action Segmentation Transformer0
Show:102550
← PrevPage 1 of 22Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AdaFocus (newly extracted I3D-features, LT-Context model)Average F176.2Unverified
2FACT (efficient hybrid of convolution and transformer model)Average F174.7Unverified
3ASQueryAverage F174.6Unverified
4BITAverage F173.7Unverified
5DiffActAverage F173.6Unverified
6BaFormerAverage F172.4Unverified
7CETNetAverage F171.8Unverified
8SF-TMN(ASFormer)Average F171.6Unverified
9RF++-SSTDAAcc70.8Unverified
10ASPnetAverage F170.6Unverified