SOTAVerified

The Little Prince in 26 Languages: Towards a Multilingual Neuro-Cognitive Corpus

2020-05-01LREC 2020Unverified0· sign in to hype

Sabrina Stehwien, Lena Henke, John Hale, Jonathan Brennan, Lars Meyer

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present the Le Petit Prince Corpus (LPPC), a multi-lingual resource for research in (computational) psycho- and neurolinguistics. The corpus consists of the children's story The Little Prince in 26 languages. The dataset is in the process of being built using state-of-the-art methods for speech and language processing and electroencephalography (EEG). The planned release of LPPC dataset will include raw text annotated with dependency graphs in the Universal Dependencies standard, a near-natural-sounding synthetic spoken subset as well as EEG recordings. We will use this corpus for conducting neurolinguistic studies that generalize across a wide range of languages, overcoming typological constraints to traditional approaches. The planned release of the LPPC combines linguistic and EEG data for many languages using fully automatic methods, and thus constitutes a readily extendable resource that supports cross-linguistic and cross-disciplinary research.

Tasks

Reproductions