Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks
Behnoosh Parsa, Ekta U. Samani, Rose Hendrix, Cameron Devine, Shashi M. Singh, Santosh Devasia, Ashis G. Banerjee
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/BehnooshParsa/HumanActionRecognition_with_ErgonomicRiskOfficialIn papertf★ 0
Abstract
Automated real-time prediction of the ergonomic risks of manipulating objects is a key unsolved challenge in developing effective human-robot collaboration systems for logistics and manufacturing applications. We present a foundational paradigm to address this challenge by formulating the problem as one of action segmentation from RGB-D camera videos. Spatial features are first learned using a deep convolutional model from the video frames, which are then fed sequentially to temporal convolutional networks to semantically segment the frames into a hierarchy of actions, which are either ergonomically safe, require monitoring, or need immediate attention. For performance evaluation, in addition to an open-source kitchen dataset, we collected a new dataset comprising twenty individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights. Results show very high (87-94)\% F1 overlap scores among the ground truth and predicted frame labels for videos lasting over two minutes and consisting of a large number of actions.