SOTAVerified

Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources

2018-08-01COLING 2018Unverified0· sign in to hype

Adeline Granet, Emmanuel Morin, Harold Mouch{\`e}re, Solen Quiniou, Christian Viard-Gaudin

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Lack of data can be an issue when beginning a new study on historical handwritten documents. In order to deal with this, we present the character-based decoder part of a multilingual approach based on transductive transfer learning for a historical handwriting recognition task on Italian Comedy Registers. The decoder must build a sequence of characters that corresponds to a word from a vector of letter-ngrams. As learning data, we created a new dataset from untapped resources that covers the same domain and period of our Italian Comedy data, as well as resources from common domains, periods, or languages. We obtain a 97.42\% Character Recognition Rate and a 86.57\% Word Recognition Rate on our Italian Comedy data, despite a lexical coverage of 67\% between the Italian Comedy data and the training data. These results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning.

Tasks

Reproductions