Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field
Ozge Mercanoglu Sincan, Jian He Low, Sobhan Asasi, Richard Bowden
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ozgemercanoglu/sltbaselinesOfficialIn paper★ 2
Abstract
Sign Language Translation (SLT) aims to automatically convert visual sign language videos into spoken language text and vice versa. While recent years have seen rapid progress, the true sources of performance improvements often remain unclear. Do reported performance gains come from methodological novelty, or from the choice of a different backbone, training optimizations, hyperparameter tuning, or even differences in the calculation of evaluation metrics? This paper presents a comprehensive study of recent gloss-free SLT models by re-implementing key contributions in a unified codebase. We ensure fair comparison by standardizing preprocessing, video encoders, and training setups across all methods. Our analysis shows that many of the performance gains reported in the literature often diminish when models are evaluated under consistent conditions, suggesting that implementation details and evaluation setups play a significant role in determining results. We make the codebase publicly available here (https://github.com/ozgemercanoglu/sltbaselines) to support transparency and reproducibility in SLT research.