SOTAVerified

Open Sentence Embeddings for Portuguese with the Serafim PT* encoders family

2024-07-28Unverified0· sign in to hype

Luís Gomes, António Branco, João Silva, João Rodrigues, Rodrigo Santos

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Sentence encoder encode the semantics of their input, enabling key downstream applications such as classification, clustering, or retrieval. In this paper, we present Serafim PT*, a family of open-source sentence encoders for Portuguese with various sizes, suited to different hardware/compute budgets. Each model exhibits state-of-the-art performance and is made openly available under a permissive license, allowing its use for both commercial and research purposes. Besides the sentence encoders, this paper contributes a systematic study and lessons learned concerning the selection criteria of learning objectives and parameters that support top-performing encoders.

Tasks

Reproductions