Japanese SimCSE Technical Report
2023-10-30Code Available1· sign in to hype
Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/hpprc/simple-simcse-jaOfficialIn paperpytorch★ 69
Abstract
We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted extensive experiments on Japanese sentence embeddings involving 24 pre-trained Japanese or multilingual language models, five supervised datasets, and four unsupervised datasets. In this report, we provide the detailed training setup for Japanese SimCSE and their evaluation results.