SOTAVerified

MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONS

2025-04-25medRxiv 2025Code Available1· sign in to hype

Promila Ghosh, Sunipun Talukder

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Code-switching in multilingual healthcare settings challenges automated transcription systems as facilities adopt AI documentation tools. To tackle this issue, we developed a cost-effective solution using the MediBeng Whisper Tiny model, a fine-tuned version of the Whisper Tiny model specifically designed for code-switched Bengali-English conversations in healthcare contexts. The model was fine-tuned on MediBeng, a synthetic dataset created to simulate the types of bilingual interactions often found in healthcare environments. The fine-tuning process involved using just 20% of the dataset, making it highly efficient in terms of computational resources and data usage. Despite the limited data, our model achieved an exceptional 0.01 Word Error Rate (WER), reflecting near-perfect transcription accuracy. Additionally, the model attained a 0.98 BLEU score, indicating its ability to accurately translate mixed-language input into English. These impressive results demonstrate that even with minimal data and resources, the model can handle complex code-switching tasks effectively. This model improves information processing for healthcare professionals, allowing doctors to save time on paperwork and focus more on patient care. It also ensures patient records are more accurate and accessible, aiding better healthcare decision-making.

Tasks

Reproductions