SOTAVerified

Augmented Bio-SBERT: Improving Performance for Pairwise Sentence Tasks in Bio-medical Domain

2022-10-01loresmt (COLING) 2022Unverified0· sign in to hype

Sonam Pankaj, Amit Gautam

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

One of the modern challenges in AI is the access to high-quality and annotated data, especially in NLP; that is why augmentation is gaining importance. In computer vision, where image data augmentation is standard, text data augmentation in NLP is complex due to the high complexity of language. Moreover, we have seen the advantages of augmentation where there are fewer data available, which can significantly improve the model’s accuracy and performance. We have implemented Augmentation in Pairwise sentence scoring in the biomedical domain. By experimenting with our approach to downstream tasks on biomedical data, we have looked into the solution to improve Bi-encoders’ sentence transformer performance using an augmented dataset generated by cross-encoders fine-tuned on Biosses and MedNLI on the pre-trained Bio-BERT model. It has significantly improved the results with respect to the model only trained on Gold data for the respective tasks.

Tasks

Reproductions