首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Chaos game representation (CGR)-walk model for DNA sequences
引用本文:高洁,徐振源.Chaos game representation (CGR)-walk model for DNA sequences[J].中国物理 B,2009,18(1):370-376.
作者姓名:高洁  徐振源
作者单位:(1)School of Science, Jiangnan University, Wuxi 214122, China;School of Information Technology, Jiangnan University, Wuxi 214122, China; (2)School of Science, Jiangnan University, Wuxi 214122, China
基金项目:Project supported by the National Natural Science Foundation of China (Grant No 60575038) and the Natural Science Foundation of Jiangnan University, China (Grant No 20070365).
摘    要:Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.

关 键 词:CGR-walk  model    DNA  sequence    long-memory    ARFIMA(p    d    q)  model  
收稿时间:2008-04-24

Chaos game representation (CGR)-walk model for DNA sequences
Gao Jie and Xu Zhen-Yuan.Chaos game representation (CGR)-walk model for DNA sequences[J].Chinese Physics B,2009,18(1):370-376.
Authors:Gao Jie and Xu Zhen-Yuan
Institution:School of Science, Jiangnan University, Wuxi 214122, China;  School of Information Technology, Jiangnan University, Wuxi 214122, China
Abstract:Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
Keywords:CGR-walk model  DNA sequence  long-memory  ARFIMA(p  d  q) model
本文献已被 维普 等数据库收录!
点击此处可从《中国物理 B》浏览原始摘要信息
点击此处可从《中国物理 B》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号