A Three-Parameter Rank-Frequency Relation in Natural Languages
2020-07-01ACL 2020Unverified0· sign in to hype
Chenchen Ding, Masao Utiyama, Eiichiro Sumita
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present that, the rank-frequency relation in textual data follows f r^-(r+)^-, where f is the token frequency and r is the rank by frequency, with (, , ) as parameters. The formulation is derived based on the empirical observation that d^2 (x+y)/dx^2 is a typical impulse function, where (x,y)=( r, f). The formulation is the power law when =0 and the Zipf--Mandelbrot law when =0. We illustrate that is related to the analytic features of syntax and + to those of morphology in natural languages from an investigation of multilingual corpora.