Itihasa: A large-scale corpus for Sanskrit to English translation

2021-06-06ACL (WAT) 2021Unverified0· sign in to hype

Rahul Aralikatte, Miryam de Lhoneux, Anoop Kunchukuttan, Anders Søgaard

Unverified — Be the first to reproduce this paper.

Abstract

This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances. We then benchmark the performance of standard translation models on this corpus and show that even state-of-the-art transformer architectures perform poorly, emphasizing the complexity of the dataset.

Tasks

Machine Translation Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Itihasa	Baseline (en->sn)	SacreBLEU	7.59	—	Unverified
Itihasa	Baseline (sn->en)	SacreBLEU	7.49	—	Unverified

Itihasa: A large-scale corpus for Sanskrit to English translation

Abstract

Tasks

Benchmark Results

Reproductions