SOTAVerified

Inverse population genetic problems with noise: inferring extent and structure of haplotype blocks from point allele frequencies

2024-06-20Unverified0· sign in to hype

Oliver Keatinge Clay

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

A haplotype block, or simply a block, is a chromosomal segment, DNA base sequence or string that occurs in only a few variants or types in the genomes of a population of interest, and that has an encapsulated or 'private' frequency distribution of the string types that is not shared by neighbouring blocks or regions on the same chromosome. We consider two inverse problems of genetic interest: from just the frequencies of the symbol types (4 base types, possible single-base alleles) at each position (point, base/nucleotide) along the string, infer the location of the left and right boundaries of the block (block extent), and the number and relative frequencies of the string types occurring in the block (block structure). The large majority of variable positions in human and also other (e.g., fungal) genomes appear to be biallelic, i.e., the position allows only a choice between two possible symbols. The symbols can then be encoded as 0 (major) and 1 (minor), or as and as in Ising models, so the scenario reduces to problems on Boolean strings/bitstrings and Boolean matrices. The specifying of major allele frequencies (MAF) as used in genetics fits naturally into this framework. A simple example from human chromosome 9 is presented.

Tasks

Reproductions