Action Segmentation
Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.
Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation
Papers
Showing 1–10 of 219 papers
All datasetsBreakfast50 SaladsGTEACOINAssembly101JIGSAWSYoutube INRIA Instructional50SaladsMPII Cooking 2 Dataset
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | AdaFocus (newly extracted I3D-features, LT-Context model) | Average F1 | 76.2 | — | Unverified |
| 2 | FACT (efficient hybrid of convolution and transformer model) | Average F1 | 74.7 | — | Unverified |
| 3 | ASQuery | Average F1 | 74.6 | — | Unverified |
| 4 | BIT | Average F1 | 73.7 | — | Unverified |
| 5 | DiffAct | Average F1 | 73.6 | — | Unverified |
| 6 | BaFormer | Average F1 | 72.4 | — | Unverified |
| 7 | CETNet | Average F1 | 71.8 | — | Unverified |
| 8 | SF-TMN(ASFormer) | Average F1 | 71.6 | — | Unverified |
| 9 | RF++-SSTDA | Acc | 70.8 | — | Unverified |
| 10 | ASPnet | Average F1 | 70.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Br-Prompt+ASPnet (RGB, flow, accelerometer) | F1@50% | 88.5 | — | Unverified |
| 2 | Semantic2Graph | F1@50% | 87.3 | — | Unverified |
| 3 | BaFormer | F1@50% | 83.9 | — | Unverified |
| 4 | DiffAct | F1@50% | 83.7 | — | Unverified |
| 5 | SF-TMN(ASFormer) | F1@50% | 82.9 | — | Unverified |
| 6 | LTContext | F1@50% | 82 | — | Unverified |
| 7 | UVAST | F1@50% | 81.7 | — | Unverified |
| 8 | Br-Prompt+ASFormer | F1@50% | 81.3 | — | Unverified |
| 9 | EUT | F1@50% | 81 | — | Unverified |
| 10 | CETNet | F1@50% | 80.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Semantic2Graph | F1@50% | 91.3 | — | Unverified |
| 2 | FACT | F1@50% | 87.5 | — | Unverified |
| 3 | DiffAct | F1@50% | 84.7 | — | Unverified |
| 4 | BaFormer | F1@50% | 83.5 | — | Unverified |
| 5 | SF-TMN(ASFormer) | F1@50% | 83.1 | — | Unverified |
| 6 | Br-Prompt+ASFormer | F1@50% | 83 | — | Unverified |
| 7 | DPRN | F1@50% | 82.9 | — | Unverified |
| 8 | BIT | F1@50% | 82.6 | — | Unverified |
| 9 | CETNet | F1@50% | 81.3 | — | Unverified |
| 10 | UVAST | F1@50% | 81 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | UnLoc-L | Frame accuracy | 72.8 | — | Unverified |
| 2 | Univl | Frame accuracy | 70 | — | Unverified |
| 3 | Norton | Frame accuracy | 69.8 | — | Unverified |
| 4 | VideoClip | Frame accuracy | 68.7 | — | Unverified |
| 5 | TACo | Frame accuracy | 68.4 | — | Unverified |
| 6 | VLM | Frame accuracy | 68.4 | — | Unverified |
| 7 | MIL-NCE | Frame accuracy | 61 | — | Unverified |
| 8 | ActBERT | Frame accuracy | 57 | — | Unverified |
| 9 | CBT | Frame accuracy | 53.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | RL+Tree | Edit Distance | 88.53 | — | Unverified |
| 2 | RL (full) | Edit Distance | 87.96 | — | Unverified |
| 3 | TricorNet | Edit Distance | 86.8 | — | Unverified |
| 4 | SDL+SC-CRF | Edit Distance | 86.21 | — | Unverified |
| 5 | TCN | Edit Distance | 83.1 | — | Unverified |
| 6 | ST-CNN+Seg | Edit Distance | 66.56 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | TSA (FINCH) | Acc | 62.4 | — | Unverified |
| 2 | TSA (Kmeans) | Acc | 59.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | EUT | Acc | 87.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Unsup. TW-FINCH (K=avg/activity) | Accuracy | 42 | — | Unverified |