Efficient Temporal Action Segmentation via Boundary-aware Query Voting
Peiyao Wang, Yuewei Lin, Erik Blasch, Jie Wei, Haibin Ling
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/peiyao-w/baformerOfficialIn papernone★ 6
Abstract
Although the performance of Temporal Action Segmentation (TAS) has improved in recent years, achieving promising results often comes with a high computational cost due to dense inputs, complex model structures, and resource-intensive post-processing requirements. To improve the efficiency while keeping the performance, we present a novel perspective centered on per-segment classification. By harnessing the capabilities of Transformers, we tokenize each video segment as an instance token, endowed with intrinsic instance segmentation. To realize efficient action segmentation, we introduce BaFormer, a boundary-aware Transformer network. It employs instance queries for instance segmentation and a global query for class-agnostic boundary prediction, yielding continuous segment proposals. During inference, BaFormer employs a simple yet effective voting strategy to classify boundary-wise segments based on instance segmentation. Remarkably, as a single-stage approach, BaFormer significantly reduces the computational costs, utilizing only 6% of the running time compared to state-of-the-art method DiffAct, while producing better or comparable accuracy over several popular benchmarks. The code for this project is publicly available at https://github.com/peiyao-w/BaFormer.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 50 Salads | BaFormer | F1@50% | 83.9 | — | Unverified |
| Breakfast | BaFormer | Average F1 | 72.4 | — | Unverified |
| GTEA | BaFormer | F1@50% | 83.5 | — | Unverified |