首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sertac Eroglu 《Complexity》2015,21(1):268-282
In a genome, genes (coding constituents) are interrupted by intergenic regions (noncoding constituents). This study provides a general picture of the large‐scale self‐organization of coding, noncoding, and total constituent lengths in genomes. Ten model genomes were examined and strong correlations between the number of genomic constituents and the constituent lengths were observed. The analysis was carried out by adopting a linguistic distribution model and a structural analogy between linguistic and genomic constructs. The proposed linguistic‐based statistical analysis may provide a fundamental basis for both understanding the linear structural formation of genomic constituents and developing insightful strategies to figure out the function of genic and intergenic regions in genomic sequences. © 2014 Wiley Periodicals, Inc. Complexity 21: 268–282, 2015  相似文献   

2.
The concept of the spectral envelope was introduced as a statistical basis for the frequency domain analysis and scaling of qualitative-valued time series. A major focus of this research was the analysis of DNA sequences. A common problem in analyzing long DNA sequence data is to identify coding sequences that are dispersed throughout the DNA and separated by regions of noncoding. Even within short subsequences of DNA, one encounters local behavior. To address this problem of local behavior in categorical-valued time series, we explore using the spectral envelope in conjunction with a dyadic tree-based adaptive segmentation method for analyzing piecewise stationary processes.  相似文献   

3.
This paper presents a method for annotating coding and noncoding DNA regions by using variable order Markov (VOM) models. A main advantage in using VOM models is that their order may vary for different sequences, depending on the sequences’ statistics. As a result, VOM models are more flexible with respect to model parameterization and can be trained on relatively short sequences and on low-quality datasets, such as expressed sequence tags (ESTs). The paper presents a modified VOM model for detecting and correcting insertion and deletion sequencing errors that are commonly found in ESTs. In a series of experiments the proposed method is found to be robust to random errors in these sequences.  相似文献   

4.
We model the evolution of biological and linguistic sequences by comparing their statistical properties. This comparison is performed by means of efficiently computable kernel functions, that take two sequences as an input and return a measure of statistical similarity between them. We show how the use of such kernels allows to reconstruct the phylogenetic trees of primates based on the mitochondrial DNA (mtDNA) of existing animals, and the phylogenetic tree of Indo-European and other languages based on sample documents from existing languages. Kernel methods provide a convenient framework for many pattern analysis tasks, and recent advances have been focused on efficient methods for sequence comparison and analysis. While a large toolbox of algorithms has been developed to analyze data by using kernels, in this paper we demonstrate their use in combination with standard phylogenetic reconstruction algorithms and visualization methods.  相似文献   

5.
We previously constructed the theory of quantum thermodynamics, which assigns operators to dual variables (for example, pressure and volume or temperature and entropy, i.e., dual pairs of extensive and intensive variables similar to momenta and coordinates in classical mechanics) similarly to the theory of quantum mechanics. Here we show that in both the bosonic and fermionic cases, the quantized entropy introduced as an operator in special Fock spaces containing a new variable, called the statistical spin, depends on some variables that do not affect any physical results and are hence called ghosts.  相似文献   

6.
We collect several observations that concern variable-length coding of two-sided infinite sequences in a probabilistic setting. Attention is paid to images and preimages of asymptotically mean stationary measures defined on subsets of these sequences. We point out sufficient conditions under which the variable-length coding and its inverse preserve asymptotic mean stationarity. Moreover, conditions for preservation of shift-invariant σ-fields and the finite-energy property are discussed, and the block entropies for stationary means of coded processes are related in some cases. Subsequently, we apply certain of these results to construct a stationary nonergodic process with a desired linguistic interpretation.  相似文献   

7.
The long-range scaling behaviours of human colonic pressure activities under normal physiological conditions are studied by using the method of detrended fluctuation analysis (DFA). The DFA is an effective period representation with a single quantitative scaling exponent α to accurately quantify long-range correlations naturally presented in a complex non-stationary time series. The method shows that the colonic activities of the healthy subjects exhibit long-range power-law correlations; however such correlations either will be destroyed if we randomly shuffle the original data or will cease to be of a power-law form if we chop some high-amplitude spikes off. These facts indicate that the colonic tissue or enteric nervous system (ENS) with a good functional motility has a good memory to its past behaviours and generates well-organized colonic spikes; however such good memory becomes too long to be remembered for the colonic activity of the slow transit constipation (STC) patient and colonic dysmotility occurs.  相似文献   

8.
9.
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) noncoding DNA dominates genomes. Here mathematical, statistical, and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of noncoding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between noncoding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law. © 2012 Wiley Periodicals, Inc. Complexity, 2012  相似文献   

10.
针对复杂系统分析中的数据信息冗余问题,提出一种基于Vague粗糙集信息熵的属性约简算法。首先,对Vague粗糙集相关概念进行拓展,提出Vague粗糙集的扩展信息熵和广义信息熵的模型;其次,对基于信息熵的属性重要性度量和属性约简原理进行研究,进而提出了一种基于Vague粗糙集信息熵的监督式属性约简算法;最后,选取UCI数据库对算法性能进行验证,计算结果表明该算法实用有效。  相似文献   

11.
For partially defined (fuzzy) data we investigate the optimal coding that permits reconstructing some crispification of the data from the code. The combinatorial entropy is obtained for the class of partially defined sequences with given parameters and a coding theorem is proved for fuzzy information sources. The properties of the corresponding entropy are studied and compared with the properties of Shannon entropy. __________ Translated from Nelineinaya Dinamika i Upravlenie, No. 4, pp. 377–396, 2004.  相似文献   

12.
The weak-form of the efficient market hypothesis (EMH) establishes that price returns behave as a pure random process and so their outcomes cannot be forecasted. The detrended fluctuation analysis (DFA) has been widely used to test the weak-form of the EMH by showing that time series of price returns are serially uncorrelated. In this case, the DFA scaling exponent exhibits deviations from the theoretical value of 0.5. This work considers the test of the EMH for DFA implementation on a sliding window, which is an approach that is intended to monitor the evolution of markets. Under these conditions, the scaling exponent exhibits important variations over the scrutinized period that can offer valuable insights in the behavior of the market provided the estimated scaling value is kept within strict statistical tests to verify the presence or not of serial correlations in the price returns. In this work, the statistical tests are based on comparing the estimated scaling exponent with the values obtained from pure Gaussian sequences with the length of the real time series. In this way, the presence of serial correlations can be guaranteed only in terms of the confidence bands of a pure Gaussian process. The crude oil (WTI) and the USA stock (DJIA) markets are used to illustrate the methodology.  相似文献   

13.
The concepts of conditional entropy of a physical system given the state of another system and of information in a physical system about another one are generalized for quantum systems. The fundamental difference between the classical case and the quantum one is that the entropy and information in quantum systems depend on the choice of measurements performed over the systems. It is shown that some equalities of the classical information theory turn into inequalities for the generalized quantities. Specific quantum phenomena such as EPR pairs and superdense coding are described and explained in terms of the generalized conditional entropy and information.  相似文献   

14.
In this paper, we present two compression methods for irregular three-dimensional (3-D) mesh sequences with constant connectivity. The proposed methods mainly use an exact integer spatial wavelet analysis (SWA) technique to efficiently decorrelate the spatial coherence of each mesh frame and also to adaptively transmit mesh frames with various spatial resolutions. To reduce the temporal redundancy, the first proposed method applies multi-order differential coding (MDC) to the temporal sequences obtained from SWA. MDC determines the optimal order of the differential coder by analyzing the variance of prediction errors. Comparing with the first order differential coding (FDC) scheme, the method can improve the compression performance. The second proposed method applies temporal wavelet analysis (TWA) to the temporal sequences. In particular, this method offers spatio-temporal multi-resolution coding. Through simulations, we prove that our methods enable efficient lossy-to-lossless compression for 3-D mesh sequences in a single frame work.  相似文献   

15.
The present paper is devoted to the study of scaling sequences that occur in the definition of entropy type invariants. The necessity to distinguish nonstandard sequences with zero entropy leads to a generalization of the entropy of decreasing sequences of measurable partitions. A more refined entropy type invariant, the scaling entropy considered by A. M. Vershik is based on the notion of -entropy of a metric space with measure. In the present work it is shown that the scaling entropy is a generalization of the entropy of decreasing sequences if 2 n is taken as the scaling sequence. The scaling entropy of the partition into the pasts of the (T,T -1)-endomorphisms is calculated. Bibliography: 11 titles.  相似文献   

16.
We propose a 1D adaptive numerical scheme for hyperbolic conservation laws based on the numerical density of entropy production (the amount of violation of the theoretical entropy inequality). This density is used as an a posteriori error which provides information if the mesh should be refined in the regions where discontinuities occur or coarsened in the regions where the solution remains smooth. As due to the Courant-Friedrich-Levy stability condition the time step is restricted and leads to time consuming simulations, we propose a local time stepping algorithm. We also use high order time extensions applying the Adams-Bashforth time integration technique as well as the second order linear reconstruction in space. We numerically investigate the efficiency of the scheme through several test cases: Sod’s shock tube problem, Lax’s shock tube problem and the Shu-Osher test problem.  相似文献   

17.
The pseudo-randomness and complexity of binary sequences generated by chaotic systems are investigated in this paper. These chaotic binary sequences can have the same pseudo-randomness and complexity as the chaotic real sequences that are transformed into them by the use of Kohda’s quantification algorithm. The statistical test, correlation function, spectral analysis, Lempel–Ziv complexity and approximate entropy are regarded as quantitative measures to characterize the pseudo-randomness and complexity of these binary sequences. The experimental results show the finite binary sequences generated by the chaotic systems have good properties with the pseudo-randomness and complexity of sequences. However, the pseudo-randomness and complexity of sequence are not added with the increase of sequence length. On the contrary, they steadily decrease with the increase of sequence length in the criterion of approximate entropy and statistical test. The constraint of computational precision is a fundamental reason resulting in the problem. So only the shorter binary sequences generated by the chaotic systems are suitable for modern cryptography without other way of adding sequence complexity in the existing computer system.  相似文献   

18.
We argue that a Golay complementary sequence is naturally viewed as a projection of a multi-dimensional Golay array. We present a three-stage process for constructing and enumerating Golay array and sequence pairs:
1.
construct suitable Golay array pairs from lower-dimensional Golay array pairs;
2.
apply transformations to these Golay array pairs to generate a larger set of Golay array pairs; and
3.
take projections of the resulting Golay array pairs to lower dimensions.
This process greatly simplifies previous approaches, by separating the construction of Golay arrays from the enumeration of all possible projections of these arrays to lower dimensions.We use this process to construct and enumerate all h2-phase Golay sequences of length m2 obtainable under any known method, including all 4-phase Golay sequences obtainable from the length 16 examples given in 2005 by Li and Chu [Y. Li, W.B. Chu, More Golay sequences, IEEE Trans. Inform. Theory 51 (2005) 1141-1145].  相似文献   

19.
We give the asymptotic behavior of the Mann–Whitney U-statistic for two independent stationary sequences. The result applies to a large class of short-range dependent sequences, including many nonmixing processes in the sense of Rosenblatt [17]. We also give some partial results in the long-range dependent case, and we investigate other related questions. Based on the theoretical results, we propose some simple corrections of the usual tests for stochastic domination; next we simulate different (nonmixing) stationary processes to see that the corrected tests perform well.  相似文献   

20.
The fierce competition and the globalization of the market economy urge manufacturing organizations to adopt advanced manufacturing paradigms for sustaining in the global markets. Supply chain management is an essential ingredient of advanced manufacturing systems and finding those partners with the best fit to the existing supply chain is a vital issue concerned in the process of managing supply chains. Due to the decision maker’s knowledge field and the nature of evaluated attributes, assessments are always with different formats, which were first unified into the linguistic terms in the standard linguistic set. Two additive fuzzy measures were used to model criteria interactions by pairs and to derive the special expressions of Marichal entropy and Choquet integral, which is more convenient to use in practice. Fuzzy measures were identified based on the maximum of Marichal entropy. The decision making procedure was illustrated taking an automobile manufacturing industry as an example, compared with the other methods, and showed the feasibilities and advantages. The inadequacies as well as the further research directions were proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号