首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Segmentation of Heteropolymer Sequences Specifying Subsequences with Different Composition and Statistical Properties
Authors:Leonid V Gusev  Valentina V Vasilevskaya  Vsevolod Ju Makeev  Pavel G Khalatur  Alexei R Khokhlov
Abstract:We have studied the segmentation of two‐letter AB heterosequences composed of subsequences with different composition and distribution of A and B monomer units along the chain. Our approach is based on the segmentation function S(k) introduced in the present work and on the Jensen–Shannon divergence measure determined with respect to the probabilities of the lengths of uniform blocks of A and B monomer units. It is shown that the function S(k) is extremely sensitive to the sequence statistics. Even visual analysis of S(k) allows judgment on some features of sequence statistics. In particular, function S(k) is constant for random copolymers, it is an oscillating function for random block copolymers and shows monotonic growth up to some constant value for proteinlike copolymers. However, due to significant fluctuations observed for short sequences, the function S(k) can be effectively used only for segmentation of a heterosequence composed of very long subsequences. On the other hand, we find that the Jensen–Shannon divergence measure does not allow one to judge the type of statistics, but is extremely efficient for segmentation of a heterosequence. Therefore, the two introduced functions, being mutually complementary, provide an effective approach for recognizing and segmentation of heterosequences. As an example, the methods developed are applied for concatenating sequences of different proteins.
image

Segmentation function S(k, l, x) as a function of parameter k and starting number x of “window” for a sequence composed of elastin and ribonuclease sequences.

Keywords:biopolymers  calculations  Jensen–  Shannon divergence measure  segmentation function
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号