A nucleotide composition constraint of genome sequences |
| |
Authors: | Zhang Chun-Ting Zhang Ren |
| |
Institution: | Department of Physics, Tianjin University, Tianjin 300072, China. ctzhang@tju.edu.cn |
| |
Abstract: | Let a, c, g and t denote the occurrence frequencies of A, C, G and T, respectively, in a genome. We calculated the statistical quantity S = a2 + c2 + g2 + t2 for each of 809 genomes (11 archaea, 42 bacteria, 3 eukaryota, 90 phages, 36 viroids and 627 viruses) and 236 plasmids. We found that S < 1/3 is strictly valid for almost all of the above genomes or plasmids. As a direct deduction of the above observation, it is shown that (i) the statistical quantity S is a kind of genome order index, which is negatively correlated with the Shannon H function; (ii) S < 1/3 suggests that a minimal value of the Shannon H function is required for each genome; (iii) S defined above would be a new biological statistical quantity, useful to describe the composition features of genomes; (iv) By jointly considering the Chargaff Parity Rule 2, it is shown that the genomic G + C content should be in between 0.211 and 0.789. |
| |
Keywords: | Nucleotide composition Genomic G + C content Shannon H function Natural selection Genome order index |
本文献已被 ScienceDirect PubMed 等数据库收录! |
|