EMPATH: MediaPipe-Aided Ensemble Learning with Attention-Based Transformers for Accurate Recognition of Bangla Word-Level Sign Language
Kazi Reyazul Hasan, Muhammad Abdullah Adnan
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/kreyazulh/EMPATHIn papernone★ 5
Abstract
In this paper, we introduce EMPATH, an advanced computational framework developed to substantially enhance the recognition of Bangla Sign Language (BdSL). By integrating Ensemble Learning, MediaPipe Holistic for gesture tracking, and an Attention-based Transformer model, EMPATH addresses the challenges of sign language interpretation, setting new accuracy benchmarks and significantly surpassing previous records: achieving 79.81% on SignBD-Word-90 (previously best at 66.05%), 70.58% on SignBD-Word (previously best at 57%), and 99.25% on BdSL40 (previously best at 89%). A pioneering feature of EMPATH is its innovative interpolation model, built to overcome the limitations posed by missing Hand keypoints of MediaPipe. Validated in both EMPATH and a basic MLP framework, this model showcases remarkable versatility across architectures. EMPATH strategically selects its preprocessing and postprocessing techniques, optimizing each for maximum impact on accuracy and performance. Extensively trained across various word-level datasets beyond Bangla, including INCLUDE-50, INCLUDE, WLASL-100, and the Malaysian Sign Language Medical Dataset, EMPATH demonstrates its broad applicability and global potential. This diverse training scheme establishes superior accuracy benchmarks: achieving an impressive 100% on INCLUDE-50, 94.67% on INCLUDE, and 93.46% on the MSL Medical Dataset. Through EMPATH, we aspire to bridge communication barriers for the deaf and hard-of-hearing communities, showcasing the profound impact of integrating advanced technological solutions to tackle the complexities of sign language recognition. Source code is available at https://github.com/kreyazulh/EMPATH.