首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute the combinatorial shift function (“matching shift”) of the well-known Boyer–Moore string matching algorithm. This method implies the computation of the length of the longest suffixes of the pattern ending at each position in this pattern. These values constituted an extra-preprocessing for a variant of the Boyer–Moore algorithm designed by Apostolico and Giancarlo. We give here a new presentation of this algorithm that avoids extra preprocessing together with a tight bound of 1.5n character comparisons (where n is the length of the text).  相似文献   

2.
In the dynamic text indexing problem, a text string has to be maintained under string insertions and deletions in order to answer on-line queries about arbitrary pattern occurrences. By means of some new techniques and data structures, we achieve improved worst-case bounds. We show that finding allpoccoccurrences of a pattern of lengthpin the current text of lengthntakesO(p + pocc + updlog p + log n) time, whereupdis the number of text uptakes performed so far; inserting or deleting a string of lengthsfrom the current text takesO(s log(s + n)) time.  相似文献   

3.
We introduce a new multidimensional pattern matching problem that is a natural generalization of string matching, a well studied problem[1]. The motivation for its algorithmic study is mainly theoretical. LetA[1:n1,…,1:nd] be a text matrix withN = n1ndentries andB[1:m1,…,1:mr] be a pattern matrix withM = m1mrentries, wheredr ≥ 1 (the matrix entries are taken from an ordered alphabet Σ). We study the problem of checking whether somer-dimensional submatrix ofAis equal toB(i.e., adecisionquery).Acan be preprocessed andBis given on-line. We define a new data structure for preprocessingAand propose CRCW-PRAM algorithms that build it inO(log N) time withN2/nmaxprocessors, wherenmax = max(n1,…,nd), such that the decision query forBtakesO(M) work andO(log M) time. By using known techniques, we would get the same preprocessing bounds but anO((dr)M) work bound for the decision query. The latter bound is undesirable since it can depend exponentially ond; our bound, in contrast, is independent ofdand optimal. We can also answer, in optimal work, two further types of queries: (a) anenumerationquery retrieving all ther-dimensional submatrices ofAthat are equal toBand (b) anoccurrencequery retrieving only the distinct positions inAthat correspond to all of these submatrices. As a byproduct, we also derive the first efficient sequential algorithms for the new problem.  相似文献   

4.
New text indexing functionalities of the compressed suffix arrays are proposed. The compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for text indexing. It occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies bits for the alphabet . In this paper we modify the data structure so that pattern matching can be done without any access to the text. In addition to the original functions of the compressed suffix array, we add new operations search, decompress and inverse to the compressed suffix arrays. We show that the new index can find occ occurrences of any substring P of the text in O(|P|logn+occlogεn) time for any fixed 1ε>0 without access to the text. The index also can decompress a part of the text of length m in O(m+logεn) time. For a text of length n on an alphabet such that , our new index occupies only bits where is the order-0 entropy of the text. Especially for ε=1 the size is bits. Therefore the index will be smaller than the text, which means we can perform fast queries from compressed texts.  相似文献   

5.
In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate, i.e., contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n log m) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm, which works in linear time when the pattern size is small. Finally, we also describe how our approach can be used to solve the distributed pattern matching problem. The preliminary version of this paper appeared in [26].  相似文献   

6.
The compressed matching problem is the problem of finding all occurrences of a pattern in a compressed text. In this paper we discuss the 2-dimensional compressed matching problem in Lempel–Ziv compressed images. Given a pattern P of (uncompressed) size m×m, and a text T of (uncompressed) size n×n, both in 2D-LZ compressed form, our algorithm finds all occurrences of P in T. The algorithm is strongly inplace, that is, the amount of extra space used is proportional to the best possible compression of a pattern of size m2. The best compression that the 2D-LZ technique can obtain for a file of size m2 is O(m). The time for performing the search is O(n2) and the preprocessing time is O(m3). Our algorithm is general in the sense that it can be used for any 2D compression which can be sequentially decompressed in small space.  相似文献   

7.
This paper generalizes the dynamic text indexing problem, introduced in [15], to insertion and deletion of strings. The problem is to quickly answer on-line queries about the occurrences of arbitrary pattern strings in a text that may change due to insertion or deletion of strings from it. To treat strings as atomic objects, we provide new sequential techniques and related data structures, which combine the suffix tree with the naming technique used in parallel computation on strings. We also introduce a geometric interpretation of the problem of finding the occurrences of a pattern in a given substring of the text. As a result, the algorithm allows the insertion in the text of a stringSinO(|S| log(n + |S|)) amortized time, and the deletion from the text of a stringSinO(|S| log n) amortized time, wherenis the length of the current text. A pattern search requiresO(p log p + upd ( + log p) + pocc) worst-case time, wherepis the length of the pattern andpoccis the number of its occurrences in the current text, obtained after the execution ofupdupdate operations. This solution requiresO(n2 log n) space, which is not initialized.We also provide a technique to reduce the space toO(n log n), yielding a solution that requiresO((p + upd) log p + pocc) query time in the worst-case,O(|S| log3/2(|S| + n)) amortized time to insert a stringSin, andO(|S| log3/2n) amortized time to delete a stringSfrom the current text.Furthermore, we use our techniques to solve the novel on-line dynamic tree matching problem that requires the on-line detection of the occurrences of an arbitrary subtree in a forest of ordered labeled trees. The forest may change due to insertion or deletion of subtrees or by renaming of some nodes. Such a problem is solved by a simple reduction to the dynamic text indexing problem.  相似文献   

8.
The main result of this paper is an algorithm for approximate matching of a regular expression of size m in a text of size n in time O(nm/logd+2n), where d is the number of allowed errors. This algorithm is the first o(mn) algorithm for approximate matching to regular expressions.  相似文献   

9.
We give first the representation of a suffix tree that uses n lg n + O(n) bits of space and supports searching for a pattern string in the given text (from a fixed size alphabet) in O(m) time, where n is the size of the text and m is the length of the pattern. The structure is quite simple and answers a question raised by Muthukrishnan (in Proceedings of the FST and TCS Preconference Workshop on Randomization, 1997, pp. 23–27). Previous compact representations of suffix trees had either a higher lower order term in space and had some expectation assumption or required more time for searching. When the size of the alphabet k is not viewed as a constant, this structure can be modified to use the same space but take O(m lg k) time for string searching or to use an additional O(n lg k) bits but take the same O(m) time for searching. We then give several index structures for binary texts, with less space including
• a structure that uses a suffix array (lg  bits) and an additional () bits,
• an indexing structure that takes bits of space, and
• an ( lg ) bit structure which answers in () time, the decision question of whether a given pattern of length occurs in the text.
Each of these structures uses a different technique, either in the storage scheme or in the search algorithm, in reducing the space requirement. The first one uses a suffix array, a sparse suffix tree, and a table structure. Finding all the occurrences of a pattern using this structure takes O(m + s) time, where s is the number of occurrences of the pattern in the text. The second structure constructs a sparse suffix tree for all the suffixes that start with the bit that occurs more times in the given binary text. The last structure uses an iterative algorithm to search for the pattern. This structure is the first o(n lg n) bit index to support the decision version of indexing queries in time linear in the length of the pattern. But this does not support the general indexing queries where we want to find the position of the occurrence of the pattern.Our main contribution is the development of techniques to use the succinct tree representation through balanced parentheses for suffix trees.  相似文献   

10.
The dynamic planar point location problem is the task of maintaining a dynamic set S of n nonintersecting (except possibly at endpoints) line segments in the plane under the following operations:
• Locate (: point): Report the segment immediately above , i.e., the first segment intersected by an upward vertical ray starting at ;
• Insert (: segment): Add segment to the collection of segments;
• Delete (: segment): Remove segment from the collection of segments.
We present a solution which requires space O(n) and has query and insertion time O(log n log log n) and deletion time O(log2n). The bounds for insertions and deletions are amortized. A query time below O(log2n) was previously only known for monotone subdivisions and subdivisions consisting of horizontal segments and required nonlinear space.  相似文献   

11.
12.
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv–Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to kinsertions, deletions and substitutions. On LZ78/LZW we need O(mkn+R) time in the worst case and O(k2n+mkmin(n,(mσ)k)+R) on average where σ is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.  相似文献   

13.
This paper investigates the existence of linear space data structures for range searching. We examine thehomothetic range search problem, where a setS ofn points in the plane is to be preprocessed so that for any triangleT with sides parallel to three fixed directions the points ofS that lie inT can be computed efficiently. We also look atdomination searching in three dimensions. In this problem,S is a set ofn points inE 3 and the question is to retrieve all points ofS that are dominated by some query point. We describe linear space data structures for both problems. The query time is optimal in the first case and nearly optimal in the second.This research was conducted while the first author was with Brown University and the second author was with the Technical University of Graz, Austria. The first author was supported in part by NSF Grant MCS 83-03925.  相似文献   

14.
We present an improved space/query-time tradeoff for the general simplex range searching problem, matching known lower bounds up to small polylogarithmic factors. In particular, we construct a linear-space simplex range searching data structure withO(n 1−1/d ) query time, which is optimal ford=2 and probably also ford>2. Further, we show that multilevel range searching data structures can be built with only a polylogarithmic overhead in space and query time per level (the previous solutions require at least a small fixed power ofn). We show that Hopcroft's problem (detecting an incidence amongn lines andn points) can be solved in time . In all these algorithms we apply Chazelle's results on computing optimal cuttings. Part of this research was done during the First Utrecht Computational Geometry Workshop, supported by the Dutch Organization for Scientific Research (NWO).  相似文献   

15.
Three related rectangle intersection problems in k-dimensional space are considered: (1) find the intersections of a rectangle with a given set of rectangles, (2) find the intersecting pairs of rectangles as they are inserted into or deleted from an existing set of rectangles, and (3) find the intersecting pairs of a given set of rectangles. By transforming these problems into range search problems, one need not divide the intersection problem into two subproblems, namely, the edge-intersecting problem and the containment problem, as done by many previous studies. Furthermore, this approach can also solve these subproblems separately, if required. For the first problem the running time is O((log n)2k−1 + s), where s is the number of intersecting pairs of rectangles. For the second problem the time needed to generate and maintain n rectangles is O(n(log n)2k) and the time for each query is O((log n)2k−1 + s). For the third problem the total time is O(n log n + n(log n)2(k−1) + s) for k ≥ 1.  相似文献   

16.
In thecollect problem(M. Saks, N. Shavit, and H. Woll,in“Proceedings of the 2nd ACM–SIAM Symposium on Discrete Algorithms, 1991),nprocessors in a shared-memory system must each learn the values ofnregisters. We give a randomized algorithm that solves the collect problem inO(n log3 n) total read and write operations with high probability, even if timing is under the control of a content-oblivious adversary (a slight weakening of the usual adaptive adversary). This improves on both the trivial upper bound ofO(n2) steps and the best previously known bound ofO(n3/2 log n) steps, and is close to the lower bound of Ω(n log n) steps. Furthermore, we show how this algorithm can be used to obtain a multiuse cooperative collect protocol that isO(log3 n)-competitive in the latency model of Ajtaiet al.(“Proceedings of the 33rd IEEE Symposium on Foundations of Computer Science,” 1994); andO(n1/2 log3/2 n)-competitive in the throughput model of Aspnes and Waarts (“Proceedings of the 28th ACM Symposium on Theory of Computing,” 1996). In both cases the competitive ratios are within a polylogarithmic factor of optimal.  相似文献   

17.
We achieve anO(log n) amortized time bound per operation for the off-line version of the dynamic convex hull problem in the plane. In this problem, a sequence ofninsert,delete, andqueryinstructions are to be processed, where each insert instruction adds a new point to the set, each delete instruction removes an existing point, and each query requests a standard convex hull search. We process the entire sequence in totalO(n log n) time andO(n) space. Alternatively, we can preprocess a sequence ofninsertions and deletions inO(n log n) time and space, then answer queries in history inO(log n) time apiece (a query in history means a query comes with a time parametert, and it must be answered with respect to the convex hull present at timet). The same bounds also hold for the off-line maintenance of several related structures, such as the maximal vectors, the intersection of half-planes, and the kernel of a polygon. Achieving anO(log n) per-operation time bound for theon-lineversions of these problems is a longstanding open problem in computational geometry.  相似文献   

18.
Given a set of values x1, x2,..., xn, of which k are nonzero, the compaction problem is the problem of moving the nonzero elements into the first k consecutive memory locations. The chaining problem asks that the nonzero elements be put into a linked list. One can in addition require that the elements remain in the same order, leading to the problems of ordered compaction and ordered chaining, respectively. This paper introduces a technique involving perfect hash functions that leads to a deterministic algorithm for ordered compaction running on a CRCW PRAM in time O(log k/log log n) using n processors. A matching lower bound for unordered compaction is given. The ordered chaining problem is shown to be solvable in time O(α(k)) with n processors (where α is a functional inverse of Ackermann′s function) and unordered chaining is shown to he solvable in constant time with n processors when k < n1/4− ε.  相似文献   

19.
We address a number of extremal point query problems when P is a set of n points in , d3 a constant, including the computation of the farthest point from a query line and the computation of the farthest point from each of the lines spanned by the points in P. In , we give a data structure of size O(n1+), that can be constructed in O(n1+) time and can report the farthest point of P from a query line segment in O(n2/3+) time, where >0 is an arbitrarily small constant. Applications of our results also include: (1) Sub-cubic time algorithms for fitting a polygonal chain through an indexed set of points in , d3 a constant, and (2) A sub-quadratic time and space algorithm that, given P and an anchor point q, computes the minimum (maximum) area triangle defined by q with P{q}.  相似文献   

20.
Let a text of u characters over an alphabet of size σ be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv–Lempel trie, called the LZ-index, that takes 4nlog2n(1+o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O(m3logσ+(m+R)logn). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号