Affiliation: | (1) and Department of Physics, Nanjing University, National laboratory of Solid State Microstructure, Institute of Biophysics, Nanjing, 210093, P.R. China;(2) Interdisciplinary Center of Theoretical Studies, Chinese Academy of Sciences, Beijing, 100080, P.R. China |
Abstract: | The size dependent complexity of protein sequences in various families in the FSSP database is characterized by sequence entropy, sequence similarity and sequence identity. As the average length Lf of sequences in the family increases, an increasing trend of the sequence entropy and a decreasing trend of the sequence similarity and sequence identity are found. As Lf increases beyond 250, a saturation of the sequence entropy, the sequence similarity and the sequence identity is observed. Such a saturated behavior of complexity is attributed to the saturation of the probability Pg of global (long-range) interactions in protein structures when Lf >250. It is also found that the alphabet size of residue types describing the sequence diversity depends on the value of Lf, and becomes saturated at 12. |