SOTAVerified

The Classical Language Toolkit: An NLP Framework for Pre-Modern Languages

2021-08-01ACL 2021Code Available1· sign in to hype

Kyle P. Johnson, Patrick J. Burns, John Stewart, Todd Cook, Cl{\'e}ment Besnier, William J. B. Mattingly

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper announces version 1.0 of the Classical Language Toolkit (CLTK), an NLP framework for pre-modern languages. The vast majority of NLP, its algorithms and software, is created with assumptions particular to living languages, thus neglecting certain important characteristics of largely non-spoken historical languages. Further, scholars of pre-modern languages often have different goals than those of living-language researchers. To fill this void, the CLTK adapts ideas from several leading NLP frameworks to create a novel software architecture that satisfies the unique needs of pre-modern languages and their researchers. Its centerpiece is a modular processing pipeline that balances the competing demands of algorithmic diversity with pre-configured defaults. The CLTK currently provides pipelines, including models, for almost 20 languages.

Tasks

Reproductions