SOTAVerified

Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models

2025-06-04Unverified0· sign in to hype

Gonçalo Mesquita, Ana Rita Cóias, Artur Dubrawski, Alexandre Bernardino

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The growing demands of stroke rehabilitation have increased the need for solutions to support autonomous exercising. Virtual coaches can provide real-time exercise feedback from video data, helping patients improve motor function and keep engagement. However, training real-time motion analysis systems demands frame-level annotations, which are time-consuming and costly to obtain. In this work, we present a framework that learns to classify individual frames from video-level annotations for real-time assessment of compensatory motions in rehabilitation exercises. We use a gradient-based technique and a pseudo-label selection method to create frame-level pseudo-labels for training a frame-level classifier. We leverage pre-trained task-specific models - Action Transformer, SkateFormer - and a foundation model - MOMENT - for pseudo-label generation, aiming to improve generalization to new patients. To validate the approach, we use the SERE dataset with 18 post-stroke patients performing five rehabilitation exercises annotated on compensatory motions. MOMENT achieves better video-level assessment results (AUC = 73\%), outperforming the baseline LSTM (AUC = 58\%). The Action Transformer, with the Integrated Gradient technique, leads to better outcomes (AUC = 72\%) for frame-level assessment, outperforming the baseline trained with ground truth frame-level labeling (AUC = 69\%). We show that our proposed approach with pre-trained models enhances model generalization ability and facilitates the customization to new patients, reducing the demands of data labeling.

Tasks

Reproductions