Unambiguosly expressing expectations about the content of prokaryotic genomes
Giorgio Gonnella
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In recent years, the sequencing, assembling and annotation of prokaryotic genomes has become increasingly easy and cheap. Thus it becomes increasingly feasible and interesting to perform comparative genomics analyses of new genomes to those of related organisms. Thereby related organisms can be defined by different criteria, such as taxonomy or phenotype. Expectations regarding the contents of genomes are often expressed in scientific articles describing group of organisms. Evaluating such expectations, when a new genome becomes available, requires analysing the text snippets which express such expectations, extracting the logical elements of the text and enabling a formal expression, more suitable for further automated analyses. Hereby we present a theoretical framework, alongside practical consideration for expressing expectations about the content of genomes, with the purpose of enabling such comparative genomics analyses. The components of the framework include a system for the definition of groups of organisms, supported by a Prokaryotic Group Types Ontology, a system for the definition of genomic contents, supported by a Prokaryotic Genomic Contents Definition Ontology. Finally we discuss how the combination of these two systems may enable an unambiguous definition of absolute and relative genome content expectation rules.