(1) Department of Applied Mathematics, National Donghwa University, Hualien, 974, Taiwan, Republic of China
Abstract:
The objective is to derive the probability distribution of the frequency of occurrence of a subsequence within a nucleotide sequence under the hypothesis that the four nucleotides occur at random and with equal probability. We also consider the Compound Poisson approximation for the same distribution. The exact probability distribution can be obtained by the finite Markov chain imbedding technique introduced by Fu and Koutras (1994), however we can manage the case as well if the probabilities are not all equal. The compound Poisson approximation by Stein-Chen's method can be used to develop an approximate probability distribution with proper setting of the definition of the sets of dependence. Such structure gives a bound on the total variation distance, which tends to get relatively larger as the frequency goes up. AMS 2000 Subject Classification: Primary: 60E05; Secondary: 60J10