Defining binary phylogenetic trees using parsimony: new bounds
Mirko Wilde, Mareike Fischer
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Phylogenetic trees are frequently used to model evolution. Such trees are typically reconstructed from data like DNA, RNA, or protein alignments using methods based on criteria like maximum parsimony (amongst others). Maximum parsimony has been assumed to work well for data with only few state changes. Recently, some progress has been made to formally prove this assertion. For instance, it has been shown that each binary phylogenetic tree T with n 20k leaves is uniquely defined by the set A_k(T), which consists of all characters with parsimony score k on T. In the present manuscript, we show that the statement indeed holds for all n 4k, thus drastically lowering the lower bound for n from 20k to 4k. However, it has been known that for n 2k and k 3, it is not generally true that A_k(T) defines T. We improve this result by showing that the latter statement can be extended from n 2k to n 2k+2. So we drastically reduce the gap of values of n for which it is unknown if trees T on n taxa are defined by A_k(T) from the previous interval of [2k+1,20k-1] to the interval [2k+3,4k-1]. Moreover, we close this gap completely for the nearest neighbor interchange (NNI) neighborhood of T in the following sense: We show that as long as n 2k+3, no tree that is one NNI move away from T (and thus very similar to T) shares the same A_k-alignment.