首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 12 毫秒
1.
Space efficient linear time construction of suffix arrays   总被引:1,自引:0,他引:1  
We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result is one of the first linear time suffix array construction algorithms, which improve upon the previously known O(nlogn) time direct algorithms for suffix sorting. It can also be used to derive a different linear time construction algorithm for suffix trees. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient.  相似文献   

2.
We present a new variant of the suffix tree called a distributed suffix tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for distributed memory parallel computing environments (e.g., Beowulf clusters). The central advantage is that standard operations of biological importance on suffix trees are shown to be easily translatable to this new data structure. While none of these operations on the DST require inter-process communication, many have optimal expected parallel running times.  相似文献   

3.
The suffix binary search tree and suffix AVL tree   总被引:1,自引:0,他引:1  
Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the efficient solution of various string processing problems—in particular on-line string searching. Here we investigate the potential of suitably adapted binary search trees as competitors in this context. The suffix binary search tree (SBST) and its balanced counterpart, the suffix AVL-tree, are conceptually simple, relatively easy to implement, and offer time and space efficiency to rival suffix trees and suffix arrays, with distinct advantages in some circumstances—for instance in cases where only a subset of the suffixes need be represented.

Construction of a suffix BST for an n-long string can be achieved in O(nh) time, where h is the height of the tree. In the case of a suffix AVL-tree this will be O(nlogn) in the worst case. Searching for an m-long substring requires O(m+l) time, where l is the length of the search path. In the suffix AVL-tree this is O(m+logn) in the worst case. The space requirements are linear in n, generally intermediate between those for a suffix tree and a suffix array.

Empirical evidence, illustrating the competitiveness of suffix BSTs, is presented.  相似文献   


4.
New text indexing functionalities of the compressed suffix arrays are proposed. The compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for text indexing. It occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies bits for the alphabet . In this paper we modify the data structure so that pattern matching can be done without any access to the text. In addition to the original functions of the compressed suffix array, we add new operations search, decompress and inverse to the compressed suffix arrays. We show that the new index can find occ occurrences of any substring P of the text in O(|P|logn+occlogεn) time for any fixed 1ε>0 without access to the text. The index also can decompress a part of the text of length m in O(m+logεn) time. For a text of length n on an alphabet such that , our new index occupies only bits where is the order-0 entropy of the text. Especially for ε=1 the size is bits. Therefore the index will be smaller than the text, which means we can perform fast queries from compressed texts.  相似文献   

5.
6.
A random suffix search tree is a binary search tree constructed for the suffixes Xi = 0 · BiBi+1Bi+2… of a sequence B1, B2, B3, … of independent identically distributed random b‐ary digits Bj. Let Dn denote the depth of the node for Xn in this tree when B1 is uniform on ?b. We show that for any value of b > 1, ??Dn = 2 log n + O(log2log n), just as for the random binary search tree. We also show that Dn/??Dn1 in probability. © 2003 Wiley Periodicals, Inc. Random Struct. Alg., 2003  相似文献   

7.
We present parallel lightweight algorithms to construct wavelet trees, rank and select structures, and suffix arrays in a shared-memory setting. The work and depth of our first parallel wavelet tree algorithm match those of the best existing parallel algorithm while requiring asymptotically less memory and our second algorithm achieves the same asymptotic bounds for small alphabet sizes. Our experiments show that they are both faster and more memory-efficient than existing parallel algorithms. We also present an experimental evaluation of the parallel construction of rank and select structures, which are used in wavelet trees. Next, we design the first parallel suffix array algorithm based on induced copying. Our induced copying requires linear work and polylogarithmic depth for constant alphabets sizes. When combined with a parallel prefix doubling algorithm, it is more efficient in practice both in terms of running time and memory usage compared to existing parallel implementations. As an application, we combine our algorithms to build the FM-index in parallel.  相似文献   

8.
Invented in the 1970s, the Suffix Tree (ST) is a data structure that indexes all substrings of a text in linear space. Although more space demanding than other indexes, the ST remains likely an inspiring index because it represents substrings in a hierarchical tree structure. Along time, STs have acquired a central position in text algorithmics with myriad of algorithms and applications to for instance motif discovery, biological sequence comparison, or text compression. It is well known that different words can lead to the same suffix tree structure with different labels. Moreover, the properties of STs prevent all tree structures from being STs. Even the suffix links, which play a key role in efficient construction algorithms and many applications, are not sufficient to discriminate the suffix trees of distinct words. The question of recognising which trees can be STs has been raised and termed Reverse Engineering on STs. For the case where a tree is given with potential suffix links, a seminal work provides a linear time solution only for binary alphabets. Here, we also investigate the Reverse Engineering problem on ST with links and exhibit a novel approach and algorithm. Hopefully, this new suffix tree characterisation makes up a valuable step towards a better understanding of suffix tree combinatorics.  相似文献   

9.
This paper describes an approximation solution method for the car sequencing problem with colors. Firstly, we study the optimality of problems with a single ratio constraint. This study also introduces a data structure for efficient calculation of the penalties related to ratio constraints. We describe the constructive greedy algorithm and variable neighborhood search adjusted for the problem with colors. Tabu metaheuristic is used to improve the results obtained by VNS. We then represent the cars with their constraints as letters over an alphabet and apply the algorithm to spell the motifs in order to improve the number of batch colors without decreasing the costs associated to the set of ratio constraints. The algorithm achieves 19 out of the 64 best results for instance sets A and B. These instances are the reference instances for Challenge ROADEF.  相似文献   

10.
The notion of balanced bipartitions of the vertices in a tree T was introduced and studied by Reid (Networks 34 (1999) 264). Reid proved that the set of balance vertices of a tree T consists of a single vertex or two adjacent vertices. In this note, we give a simple proof of that result.  相似文献   

11.
12.
Let G be a graph with a perfect matching M. In this paper, we prove two theorems to characterize the graph G in which there is no M-alternating path between two vertices x and y in G.  相似文献   

13.
Bioinformatics, the discipline which studies the computational problems arising from molecular biology, poses many interesting problems to the string searching community. We will describe two problems arising from Bioinformatics, their preliminary solutions, and the more general problem that they pose. The first problem is searching for α-helices in protein sequences. This particular instance of the search is based on matching of hydrophobicity/hydrophilicity. We find an algorithm which is linear in the sequence length for fixed helix length and is O(nlogn) for any helix length. The second problem is on matching probabilistic sequences against sequences or against other probabilistic sequences. In both cases we derive efficient formulas to compute scores according to a Markovian model of evolution.  相似文献   

14.
图G的零阶广义Randi指标定义为0Rα(G)=v∈V(G)d(v)α,其中d(v)为图G的顶点v的度,α为任意实数.研究了树的零阶广义Rα指标的极值问题,利用分析和图的理论,确定了任意给定最大匹配数的树的最大和最小Rα的值,并刻画了达到该极值的树.  相似文献   

15.
Let a text of u characters over an alphabet of size σ be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv–Lempel trie, called the LZ-index, that takes 4nlog2n(1+o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O(m3logσ+(m+R)logn). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found.  相似文献   

16.
The matching energy of a graph is defined as the sum of the absolute values of the zeros of its matching polynomial. For any integer t≥1, a graph G is called t‐apex tree if there exists a t‐set such that G ? X is a tree, while for any with , G ? Y is not a tree. Let be the set of t‐apex trees of order n. In this article, we determine the extremal graphs from with minimal and maximal matching energies, respectively. Moreover, as an application, the extremal cacti of order n and with s cycles have been completely characterized at which the minimal matching energy are attained. © 2015 Wiley Periodicals, Inc. Complexity 21: 238–247, 2016  相似文献   

17.
In a previous paper (C. R. Acad. Sci. Paris Sér. I 333 (2001) 763–768), the author introduced a notion of compatibility between a Poisson structure and a pseudo-Riemannian metric. In this paper, we introduce a new class of Lie algebras called pseudo-Riemannian Lie algebras. The two notions are closely related: we prove that the dual of a Lie algebra endowed with its canonical linear Poisson structure carries a compatible pseudo-Riemannian metric if and only if the Lie algebra is a pseudo-Riemannian Lie algebra. Moreover, the Lie algebra obtained by linearizing at a point a Poisson manifold with a compatible pseudo-Riemannian metric is a pseudo-Riemannian Lie algebra. We also give some properties of the symplectic leaves of such manifolds, and we prove that every Poisson manifold with a compatible Riemannian metric is unimodular. Finally, we study Poisson Lie groups endowed with a compatible pseudo-Riemannian metric, and we give the classification of all pseudo-Riemannian Lie algebras of dimension 2 and 3.  相似文献   

18.
The second order hypergeometric q-difference operator is studied for the value c = −q. For certain parameter regimes the corresponding recurrence relation can be related to a symmetric operator on the Hilbert space ℓ2( ). The operator has deficiency indices (1, 1) and we describe as explicitly as possible the spectral resolutions of the self-adjoint extensions. This gives rise to one-parameter orthogonality relations for sums of two 21-series. In particular, we find that the Ismail-Zhang q-analogue of the exponential function satisfies certain orthogonality relations.  相似文献   

19.
This paper considers three checkpointing schemes which combine a double modular redundancy and three types of checkpoints: compare-and-store-checkpoint (CSCP), store-checkpoint (SCP), and compare-checkpoint (CCP). An execution time of a task is divided equally into n intervals, and at the end of each interval, a CSCP is always placed. Further, each CSCP interval is also divided equally into m intervals, and at the end of each interval, either CCP or SCP is placed except the last one. Introducing the overheads of comparison, storage, and retry, the mean execution times to complete a task for three schemes are obtained, using the theory of probability. Optimal checkpointing intervals, which minimize the mean times, are analytically derived, and are numerically computed. Three schemes are compared as numerical examples and the best checkpointing scheme is chosen.  相似文献   

20.
In (Comm. Math. Phys. 188 (1997) 121–133) Herzlich proved a new positive mass theorem for Riemannian 3-manifolds (N,g) whose mean curvature of the boundary allows some positivity. In this paper we study what happens to the limit case of the theorem when, at a point of the boundary, the smallest positive eigenvalue of the Dirac operator of the boundary is strictly larger than one-half of the mean curvature (in this case the mass m(g) must be strictly positive). We prove that the mass is bounded from below by a positive constant c(g), m(g)c(g), and the equality m(g)=c(g) holds only if, outside a compact set, (N,g) is conformally flat and the scalar curvature vanishes. The constant c(g) is uniquely determined by the metric g via a Dirac-harmonic spinor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号