SOTAVerified

Identifying Cleartext in Historical Ciphers

2022-06-01LT4HALA (LREC) 2022Unverified0· sign in to hype

Maria-Elena Gambardella, Beata Megyesi, Eva Pettersson

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In historical encrypted sources we can find encrypted text sequences, also called ciphertext, as well as non-encrypted cleartexts written in a known language. While most of the cryptanalysis focuses on the decryption of ciphertext, cleartext is often overlooked although it can give us important clues about the historical interpretation and contextualisation of the manuscript. In this paper, we investigate to what extent we can automatically distinguish cleartext from ciphertext in historical ciphers and to what extent we are able to identify its language. The problem is challenging as cleartext sequences in ciphers are often short, up to a few words, in different languages due to historical code-switching. To identify the sequences and the language(s), we chose a rule-based approach and run 7 different models using historical language models on various ciphertexts.

Tasks

Reproductions