首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
New text indexing functionalities of the compressed suffix arrays are proposed. The compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for text indexing. It occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies bits for the alphabet . In this paper we modify the data structure so that pattern matching can be done without any access to the text. In addition to the original functions of the compressed suffix array, we add new operations search, decompress and inverse to the compressed suffix arrays. We show that the new index can find occ occurrences of any substring P of the text in O(|P|logn+occlogεn) time for any fixed 1ε>0 without access to the text. The index also can decompress a part of the text of length m in O(m+logεn) time. For a text of length n on an alphabet such that , our new index occupies only bits where is the order-0 entropy of the text. Especially for ε=1 the size is bits. Therefore the index will be smaller than the text, which means we can perform fast queries from compressed texts.  相似文献   

2.
This paper generalizes the dynamic text indexing problem, introduced in [15], to insertion and deletion of strings. The problem is to quickly answer on-line queries about the occurrences of arbitrary pattern strings in a text that may change due to insertion or deletion of strings from it. To treat strings as atomic objects, we provide new sequential techniques and related data structures, which combine the suffix tree with the naming technique used in parallel computation on strings. We also introduce a geometric interpretation of the problem of finding the occurrences of a pattern in a given substring of the text. As a result, the algorithm allows the insertion in the text of a stringSinO(|S| log(n + |S|)) amortized time, and the deletion from the text of a stringSinO(|S| log n) amortized time, wherenis the length of the current text. A pattern search requiresO(p log p + upd ( + log p) + pocc) worst-case time, wherepis the length of the pattern andpoccis the number of its occurrences in the current text, obtained after the execution ofupdupdate operations. This solution requiresO(n2 log n) space, which is not initialized.We also provide a technique to reduce the space toO(n log n), yielding a solution that requiresO((p + upd) log p + pocc) query time in the worst-case,O(|S| log3/2(|S| + n)) amortized time to insert a stringSin, andO(|S| log3/2n) amortized time to delete a stringSfrom the current text.Furthermore, we use our techniques to solve the novel on-line dynamic tree matching problem that requires the on-line detection of the occurrences of an arbitrary subtree in a forest of ordered labeled trees. The forest may change due to insertion or deletion of subtrees or by renaming of some nodes. Such a problem is solved by a simple reduction to the dynamic text indexing problem.  相似文献   

3.
In this paper, we consider the problem of computing the region visible to a query point located in a given polygonal domain. The polygonal domain is specified by a simple polygon with m holes and a total of n vertices. We provide two bounds on the complexity of this problem. One approach constructs a data structure with space complexity O(n2) in time O(n2lgn) and yields a query time of O((1+min(m,|V(q)|))lg2n+m+|V(q)|). Here, V(q) represents the set of vertices of the visibility polygon of a query point q, and |E| denotes the number of edges in the visibility graph. The other approach provides a data structure with space complexity O(min(|E|,mn)+n) in time O(T+|E|+nlgn) with the query time of O(|V(q)|lgn+m). Here, T is the time to triangulate the given polygonal region (which is O(n+mlg1+m) for a small positive constant >0). In both of these approaches, O(m) additive factor in the query time is eliminated with an additional O((min(|E|,mn))2) space and an additional O(m(min(|E|,mn))2) preprocessing time.  相似文献   

4.
Let F be a family of mutually nonoverlapping unit balls in the n -dimensional Euclidean space Rn. The distance between the centres of A,B   F is denoted by d(A, B). We prove, among others, that if d(A, B)  <  4 and n ≥  5, then A andB are always visible from each other, that is, a light ray emanating from the surface of A reaches B without being blocked by other unit balls. Furthermore, if d(A, B)  < 2n / 2, then any small “shake’ of F can make A, B visible from each other.  相似文献   

5.
The enumeration of transitive ordered factorizations of a given permutation is a combinatorial problem related to singularity theory. Let n ≥ 1, and let σ0 be a permutation of n having di cycles of length i, for i ≥ 1. Let m ≥ 2. We prove that the number of m-tuples (σ1,…,σm) of permutation of n such that
• σσ···σ = σ,
• the group generated by σ,…,σ acts transitively on {1, 2,…,},
• ∑ (σ) = ( − 1) + 2, where (σ) denotes the number of cycles of σ
A one-to-one correspondence relates these m-tuples to some rooted planar maps, which we call constellations and enumerate via a bijection with some bicolored trees. For m = 2, we recover a formula of Tutte for the number of Eulerian maps. The proof relies on the idea that maps are conjugacy classes of trees and extends the method previously applied to Eulerian maps by the second author. Our result might remind the reader of an old theorem of Hurwitz, giving the number of m-tuples of transpositions satisfying the above conditions. Indeed, we show that our result implies Hurwitz' theorem. We also briefly discuss its implications for the enumeration of nonequivalent coverings of the sphere.  相似文献   

6.
The compressed matching problem is the problem of finding all occurrences of a pattern in a compressed text. In this paper we discuss the 2-dimensional compressed matching problem in Lempel–Ziv compressed images. Given a pattern P of (uncompressed) size m×m, and a text T of (uncompressed) size n×n, both in 2D-LZ compressed form, our algorithm finds all occurrences of P in T. The algorithm is strongly inplace, that is, the amount of extra space used is proportional to the best possible compression of a pattern of size m2. The best compression that the 2D-LZ technique can obtain for a file of size m2 is O(m). The time for performing the search is O(n2) and the preprocessing time is O(m3). Our algorithm is general in the sense that it can be used for any 2D compression which can be sequentially decompressed in small space.  相似文献   

7.
Let a text string T of n symbols and a pattern string P of m symbols from alphabet Σ be given. A swapped version T′ of T is a length n string derived from T by a series of local swaps (i.e., t ← tℓ + 1 and tℓ + 1 ← t), where each element can participate in no more than one swap. The pattern matching with swaps problem is that of finding all locations i for which there exists a swapped version T′ of T with an exact matching of P in location i of T′. It has been an open problem whether swapped matching can be done in less than O(nm) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(nm). We present an algorithm whose time complexity is O(nm1/3 log m log σ) for a general alphabet Σ, where σ = min(m,Σ).  相似文献   

8.
Let a text of u characters over an alphabet of size σ be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv–Lempel trie, called the LZ-index, that takes 4nlog2n(1+o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O(m3logσ+(m+R)logn). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found.  相似文献   

9.
Already in his Lectures on Search [A. Rényi, Lectures on the theory of search, University of North Carolina, Chapel Hill, Institute of Statistics, Mimeo Series No. 6007, 1969. [11]] Renyi suggested to consider a search problem, where an unknown is to be found by asking for containment in a minimal number m(n,k) of subsets A1,…,Am with the restrictions |Ai|k<n/2 for i=1,2,…,m.Katona gave in 1966 the lower bound m(n,k)logn/h(k/n) in terms of binary entropy and the upper bound m(n,k)(logn+1)/logn/k·n/k, which was improved by Wegener in 1979 to m(n,k)logn/logn/k(n/k-1).We prove here for k=pn that m(n,k)=logn+o(logn)/h(p), that is, ratewise optimality of the entropy bound: .Actually this work was motivated by a more recent study of Karpovsky, Chakrabarty, Levitin and Avresky of a problem on fault diagnosis in hypercubes, which amounts to finding the minimal number M(n,r) of Hamming balls of radius r=ρn with in the Hamming space , which separate the vertices. Their bounds on M(n,r) are far from being optimal. We establish bounds implying
However, it must be emphasized that the methods of prove for our two upper bounds are quite different.  相似文献   

10.
We present a data structure for ray-shooting queries in a set of convex fat polyhedra of total complexity n in . The data structure uses O(n2+ε) storage and preprocessing time, and queries can be answered in O(log2n) time. A trade-off between storage and query time is also possible: for any m with n<m<n2, we can construct a structure that uses O(m1+ε) storage and preprocessing time such that queries take time.We also describe a data structure for simplex intersection queries in a set of n convex fat constant-complexity polyhedra in . For any m with n<m<n3, we can construct a structure that uses O(m1+ε) storage and preprocessing time such that all polyhedra intersecting a query simplex can be reported in O((n/m1/3)logn+k) time, where k is the number of answers.  相似文献   

11.
A Range Minimum Query asks for the position of a minimal element between two specified array-indices. We consider a natural extension of this, where our further constraint is that if the minimum in a query interval is not unique, then the query should return an approximation of the median position among all positions that attain this minimum. We present a succinct preprocessing scheme using Dn + o(n) bits in addition to the static input array (small constant D), such that subsequent “range median of minima queries” can be answered in constant time. This data structure can be built in linear time, with little extra space needed at construction time. We introduce several new combinatorial concepts such as Super-Cartesian Trees and Super-Ballot Numbers. We give applications of our preprocessing scheme in text indexes such as (compressed) suffix arrays and trees.  相似文献   

12.
In this paper we present an algorithm to compute the rectilinear geodesic voronoi neighbor of an arbitrary query pointqamong a setSofmpoints in the presence of a set ofnvertical line segment obstacles inside a rectangular floor. The distance between a pair of points α and β is the shortest rectilinear distance avoiding the obstacles in and is denoted by δ(α, β). The rectilinear geodesic voronoi neighbor of an arbitrary query pointq,RGVN(q) is the pointpiSsuch that δ(q, pi) is minimum. The algorithm suggests a preprocessing of the elements of the setsSand inO((m + n)log(m + n)) time such that for an arbitrary query pointq, theRGVNquery can be answered inO(log(m + n)) time. The space required for storing the preprocessed information isO(n + m log m). If the points inSare placed on the boundary of the rectangular floor, a different technique is adopted to decrease the space complexity toO(m + n). This technique works even if the obstacles are rectangles instead of line segments. Finally, the parallelization of the preprocessing steps for the latter algorithm is suggested, which takesO(log3(m + n)) time, usingO((m + n)1.5/log2(m + n)) processors andO(log(m + n)) query time.  相似文献   

13.
《Journal of Complexity》1999,15(1):30-71
We describe fast parallel algorithms for building index data structures that can be used to gather various statistics on square matrices. The main data structure is the Lsuffix tree, which is a generalization of the classical suffix tree for strings. Given ann×ntext matrixA, we build our data structures inO(log n) time withn2processors on a CRCW PRAM, so that we can quickly processAin parallel as follows: (i) report some statistical information aboutA, e.g., find the largest repeated square submatrices that appear at least twice inAor determine, for each position inA, the smallest submatrix that occurs only there; (ii) given, on-line, anm×mpattern matrixPAT, check whether it occurs inA. We refer to the above two kinds of operations as queries and point out that they have applications to visual databases and two-dimensional data compression. Query (i) takesO(log n) time withn2/log nprocessors and query (ii) takesO(log m) time withm2/log mprocessors. The query algorithms are work optimal while the construction algorithm is work optimal only for arbitrary and large alphabets.  相似文献   

14.
We consider Las Vegas randomized dynamic algorithms for on-line connectivity problems with deletions only. In particular, we show that starting from a graph with m edges and n nodes, we can maintain a spanning forest during m deletions in O(m log(n2/m) + n(log n)3(log log n)2) expected time, which is O(m) if m = Θ(n2) and O(m log n) if m = Ω(n(log n log log n)2). The deletions may be interspersed with connectivity queries, each of which is answered in constant time. The previous best bound was O(m log2 n) by Henzinger and Thorup which covered both insertions and deletions. The result is based on a general randomized reduction for edge connectivity problems of many deletions-only queries to a few deletions and insertions queries. For 2-edge connectivity, the complexity is improved from O(m(log n)5) to O(m log(n2/m) + n(log n)6(log log n)2). For the general decremental k-edge-connectivity problem, we get a total running time of O(k2n2 polylog n). Here the previous best bound was O(kmn polylog n). Improved running times are also achieved for the static consensus tree problem, with applications to computational biology and relational data bases.  相似文献   

15.
We revisit one of the most fundamental classes of data structure problems in computational geometry: range searching. Matoušek (Discrete Comput. Geom. 10:157–182, 1993) gave a partition tree method for d-dimensional simplex range searching achieving O(n) space and O(n 1−1/d ) query time. Although this method is generally believed to be optimal, it is complicated and requires O(n 1+ε ) preprocessing time for any fixed ε>0. An earlier method by Matoušek (Discrete Comput. Geom. 8:315–334, 1992) requires O(nlogn) preprocessing time but O(n 1−1/d log O(1) n) query time. We give a new method that achieves simultaneously O(nlogn) preprocessing time, O(n) space, and O(n 1−1/d ) query time with high probability. Our method has several advantages:
•  It is conceptually simpler than Matoušek’s O(n 1−1/d )-time method. Our partition trees satisfy many ideal properties (e.g., constant degree, optimal crossing number at almost all layers, and disjointness of the children’s cells at each node).  相似文献   

16.
The theoretical presentation and analysis is given for two families of simple in-place merging algorithms and their limiting cases. The first family merges stably inO(k·n) time andO(n 1/k ) additional space with a limiting case running inO(n logn) time and constant space. The second family merges unstably inO (k ·n) time andO(log k n) space with a limiting case running inO(nG(n)) time and constant space. HereG(n) is the leastk such thatF(k) n whereF(0)=1 andF(i)=2 F(i–1) fori1. Each algorithm gives rise to a corresponding merge sort.  相似文献   

17.
We propose succinct data structures for text retrieval systems supporting document listing queries and ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents. Traditional data structures for these problems support queries only for some predetermined keywords. Recently Muthukrishnan proposed a data structure for document listing queries for arbitrary patterns at the cost of data structure size. For computing the tf*idf scores there has been no efficient data structures for arbitrary patterns.Our new data structures support these queries using small space. The space is only 2/ times the size of compressed documents plus 10n bits for a document collection of length n, for any 0<1. This is much smaller than the previous O(nlogn) bit data structures. Query time is O(m+qlogn) for listing and computing tf*idf scores for all q documents containing a given pattern of length m. Our data structures are flexible in a sense that they support queries for arbitrary patterns.  相似文献   

18.
In this paper we present efficient deterministic algorithms for various problems involving lines or segments in the plane, using the partitioning algorithm described in a companion paper [A3]. These applications include: (i) anO(m 2/3 n 2/3 · log2/3 n · log/3 (m/n)+(m+n) logn) algorithm to compute all incidences betweenm points andn lines, where is a constant <3.33; (ii) anO(m 2/3 n 2/3 · log5/3 n · log/3 (m/n)+(m+n) logn) algorithm to computem faces in an arrangement ofn lines; (iii) anO(n 4/3 log(+2)/3 n) algorithm to count the number of intersections in a set ofn segments; (iv) anO(n 4/3 log( + 2)/3 n) algorithm to count red-blue intersections between two sets of segments, and (v) anO(n 3/2 log/3 n) algorithm to compute spanning trees with low stabbing number for a set ofn points. We also present an algorithm that, given set ofn points in the plane, preprocesses it, in timeO(nm log+1/2 n), into a data structure of sizeO(m) forn lognmn 2, so that the number of points ofS lying inside a query triangle can be computed inO((n/m) log3/2 n) time.Work on this paper has been supported by Office of Naval Research Grant N00014-87-K-0129, by National Science Foundation Grant DCR-83-20085, and by grants from the Digital Equipment Corporation and the IBM Corporation. A preliminary version of this paper appears in theProceedings of the 5th ACM Symposium on Computational Geometry, 1989, pp. 11–22.  相似文献   

19.
Character sets of strings   总被引:2,自引:1,他引:1  
Given a string S over a finite alphabet Σ, the character set (also called the fingerprint) of a substring S of S is the subset CΣ of the symbols occurring in S. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several computational problems concerning the character sets of a string arise from these applications, especially:
(1) Output all the maximal locations of substrings having a given character set.
(2) Output for each character set C occurring in a given string (or a given collection of strings) all the maximal locations of C.
Denoting by n the total length of the considered string or collection of strings, we solve the first problem in Θ(n) time using Θ(n) space. We present two algorithms solving the second problem. The first one runs in Θ(n2) time using Θ(n) space. The second algorithm has Θ(n|Σ|log|Σ|) time and Θ(n) space complexity and is an adaptation of an algorithm by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 26 (2003) 1–13].  相似文献   

20.
We present a new pivot-based algorithm which can be used with minor modification for the enumeration of the facets of the convex hull of a set of points, or for the enumeration of the vertices of an arrangement or of a convex polyhedron, in arbitrary dimension. The algorithm has the following properties:
(a)  Virtually no additional storage is required beyond the input data.
(b)  The output list produced is free of duplicates.
(c)  The algorithm is extremely simple, requires no data structures, and handles all degenerate cases.
(d)  The running time is output sensitive for nondegenerate inputs.
(e)  The algorithm is easy to parallelize efficiently.
For example, the algorithm finds thev vertices of a polyhedron inR d defined by a nondegenerate system ofn inequalities (or, dually, thev facets of the convex hull ofn points inR d, where each facet contains exactlyd given points) in timeO(ndv) andO(nd) space. Thev vertices in a simple arrangement ofn hyperplanes inR d can be found inO(n 2 dv) time andO(nd) space complexity. The algorithm is based on inverting finite pivot algorithms for linear programming.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号