Implicit readability ranking using the latent variable of a Bayesian Probit model

2016-12-01WS 2016Unverified0· sign in to hype

Johan Falkenjack, Arne J{\"o}nsson

Unverified — Be the first to reproduce this paper.

Abstract

Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.

Tasks

Classification General Classification

Implicit readability ranking using the latent variable of a Bayesian Probit model

Abstract

Tasks

Reproductions