首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
The indexing problem is where a text is preprocessed and subsequent queries of the form “Find all occurrences of pattern P in the text” are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form “Find all occurrences of dictionary patterns in text T” are answered in time proportional to the length of the text and the number of occurrences.There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location.In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log2 n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences. Our query time for the dictionary matching problem is O(n log3 d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets.  相似文献   

2.
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv–Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to kinsertions, deletions and substitutions. On LZ78/LZW we need O(mkn+R) time in the worst case and O(k2n+mkmin(n,(mσ)k)+R) on average where σ is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.  相似文献   

3.
New text indexing functionalities of the compressed suffix arrays are proposed. The compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for text indexing. It occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies bits for the alphabet . In this paper we modify the data structure so that pattern matching can be done without any access to the text. In addition to the original functions of the compressed suffix array, we add new operations search, decompress and inverse to the compressed suffix arrays. We show that the new index can find occ occurrences of any substring P of the text in O(|P|logn+occlogεn) time for any fixed 1ε>0 without access to the text. The index also can decompress a part of the text of length m in O(m+logεn) time. For a text of length n on an alphabet such that , our new index occupies only bits where is the order-0 entropy of the text. Especially for ε=1 the size is bits. Therefore the index will be smaller than the text, which means we can perform fast queries from compressed texts.  相似文献   

4.
In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate, i.e., contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n log m) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm, which works in linear time when the pattern size is small. Finally, we also describe how our approach can be used to solve the distributed pattern matching problem. The preliminary version of this paper appeared in [26].  相似文献   

5.
This paper generalizes the dynamic text indexing problem, introduced in [15], to insertion and deletion of strings. The problem is to quickly answer on-line queries about the occurrences of arbitrary pattern strings in a text that may change due to insertion or deletion of strings from it. To treat strings as atomic objects, we provide new sequential techniques and related data structures, which combine the suffix tree with the naming technique used in parallel computation on strings. We also introduce a geometric interpretation of the problem of finding the occurrences of a pattern in a given substring of the text. As a result, the algorithm allows the insertion in the text of a stringSinO(|S| log(n + |S|)) amortized time, and the deletion from the text of a stringSinO(|S| log n) amortized time, wherenis the length of the current text. A pattern search requiresO(p log p + upd ( + log p) + pocc) worst-case time, wherepis the length of the pattern andpoccis the number of its occurrences in the current text, obtained after the execution ofupdupdate operations. This solution requiresO(n2 log n) space, which is not initialized.We also provide a technique to reduce the space toO(n log n), yielding a solution that requiresO((p + upd) log p + pocc) query time in the worst-case,O(|S| log3/2(|S| + n)) amortized time to insert a stringSin, andO(|S| log3/2n) amortized time to delete a stringSfrom the current text.Furthermore, we use our techniques to solve the novel on-line dynamic tree matching problem that requires the on-line detection of the occurrences of an arbitrary subtree in a forest of ordered labeled trees. The forest may change due to insertion or deletion of subtrees or by renaming of some nodes. Such a problem is solved by a simple reduction to the dynamic text indexing problem.  相似文献   

6.
The problem of sorting n integers from a restricted range [1…m], where m is a superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = npolylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speedup. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and nonstandard model of parallel computation. Efficient simulations of the strong model on a CRCW PRAM are introduced. One of the simulations even achieves optimal speedup. This is probably the first optimal speedup simulation of a certain kind.  相似文献   

7.
The string matching with mismatches problem requires finding the Hamming distance between a pattern P of length m and every length m substring of text T with length n. Fischer and Paterson's FFT-based algorithm solves the problem without error in O(σnlogm), where σ is the size of the alphabet Σ [SIAM–AMS Proc. 7 (1973) 113–125]. However, this in the worst case reduces to O(nmlogm). Atallah, Chyzak and Dumas used the idea of randomly mapping the letters of the alphabet to complex roots of unity to estimate the score vector in time O(nlogm) [Algorithmica 29 (2001) 468–486]. We show that the algorithm's score variance can be substantially lowered by using a bijective mapping, and specifically to zero in the case of binary and ternary alphabets. This result is extended via alphabet remappings to deterministically solve the string matching with mismatches problem with a constant factor of 2 improvement over Fischer–Paterson's method.  相似文献   

8.
All-Pairs Small-Stretch Paths   总被引:1,自引:0,他引:1  
Let G = (VE) be a weighted undirected graph. A path between uv  V is said to be of stretch t if its length is at most t times the distance between u and v in the graph. We consider the problem of finding small-stretch paths between all pairs of vertices in the graph G.It is easy to see that finding paths of stretch less than 2 between all pairs of vertices in an undirected graph with n vertices is at least as hard as the Boolean multiplication of two n × n matrices. We describe three algorithms for finding small-stretch paths between all pairs of vertices in a weighted graph with n vertices and m edges. The first algorithm, STRETCH2, runs in Õ(n3/2m1/2) time and finds stretch 2 paths. The second algorithm, STRETCH7/3, runs in Õ(n7/3) time and finds stretch 7/3 paths. Finally, the third algorithm, STRETCH3, runs in Õ(n2) and finds stretch 3 paths.Our algorithms are simpler, more efficient and more accurate than the previously best algorithms for finding small-stretch paths. Unlike all previous algorithms, our algorithms are not based on the construction of sparse spanners or sparse neighborhood covers.  相似文献   

9.
The main result of this paper is an algorithm for approximate matching of a regular expression of size m in a text of size n in time O(nm/logd+2n), where d is the number of allowed errors. This algorithm is the first o(mn) algorithm for approximate matching to regular expressions.  相似文献   

10.
A matching game is a cooperative game (N, v) defined on a graph G = (N, E) with an edge weighting w: E? \mathbb R+{w: E\to {\mathbb R}_+}. The player set is N and the value of a coalition S í N{S \subseteq N} is defined as the maximum weight of a matching in the subgraph induced by S. First we present an O(nm + n 2 log n) algorithm that tests if the core of a matching game defined on a weighted graph with n vertices and m edges is nonempty and that computes a core member if the core is nonempty. This algorithm improves previous work based on the ellipsoid method and can also be used to compute stable solutions for instances of the stable roommates problem with payments. Second we show that the nucleolus of an n-player matching game with a nonempty core can be computed in O(n 4) time. This generalizes the corresponding result of Solymosi and Raghavan for assignment games. Third we prove that is NP-hard to determine an imputation with minimum number of blocking pairs, even for matching games with unit edge weights, whereas the problem of determining an imputation with minimum total blocking value is shown to be polynomial-time solvable for general matching games.  相似文献   

11.
Given two undirected trees T and P, the Subtree Homeomorphism Problem is to find whether T has a subtree t that can be transformed into P by removing entire subtrees, as well as repeatedly removing a degree-2 node and adding the edge joining its two neighbors. In this paper we extend the Subtree Homeomorphism Problem to a new optimization problem by enriching the subtree-comparison with node-to-node similarity scores. The new problem, called Approximate Labelled Subtree Homeomorphism (ALSH), is to compute the homeomorphic subtree of T which also maximizes the overall node-to-node resemblance. We describe an O(m2n/logm+mnlogn) algorithm for solving ALSH on unordered, unrooted trees, where m and n are the number of vertices in P and T, respectively. We also give an O(mn) algorithm for rooted ordered trees and O(mnlogm) and O(mn) algorithms for unrooted cyclically ordered and unrooted linearly ordered trees, respectively.  相似文献   

12.
Cheriyan and Hagerup developed a randomized algorithm to compute the maximum flow in a graph with n nodes and m edges in O(mn + n2 log2n) expected time. The randomization is used to efficiently play a certain combinatorial game that arises during the computation. We give a version of their algorithm where a general version of their game arises. Then we give a strategy for the game that yields a deterministic algorithm for computing the maximum flow in a directed graph with n nodes and m edges that runs in time O(mn(logm/n log nn)). Our algorithm gives an O(mn) deterministic algorithm for all m/n = Ω(nε) for any positive constant ε, and is currently the fastest deterministic algorithm for computing maximum flow as long as m/n = ω(log n).  相似文献   

13.
Consider the problem of identifying min T(f) and max F(f) of a positive (i.e., monotone) Boolean functionf, by using membership queries only, where min T(f) (max F(f)) denotes the set of minimal true vectors (maximum false vectors) off. Moreover, as the existence of a polynomial total time algorithm (i.e., polynomial time in the length of input and output) for this problem is still open, we consider here a restricted problem: given an unknown positive functionfofnvariables, decide whetherfis 2-monotonic or not, and iffis 2-monotonic, output both min T(f) and max F(f). For this problem, we propose a simple algorithm, which is based on the concept of maximum latency, and we show that it usesO(n2m) time andO(n2m) queries, wherem = |min T(f)| + |max F(f)|. This answers affirmatively the conjecture raised in Boroset al.[Lecture Notes in Comput. Sci.557(1991), 104–115], Boroset al.[SIAM J. Comput.26(1997), 93–109], and is an improvement over the two algorithms discussed therein: one usesO(n3m) time andO(n3m) queries, and the other usesO(nm2 + n2m) time andO(nm) queries.  相似文献   

14.
Given a rectangular m×n matrix stored as a two-dimensional array, we want to transpose it in place and measure the cost by the number of memory writes and the number of auxiliary cells used. We propose a transposition algorithm with optimal complexity O(mn) using only min(m,n) auxiliary memory cells.  相似文献   

15.
Ray Shooting Amidst Convex Polygons in 2D   总被引:1,自引:0,他引:1  
We consider the problem of ray shooting in a two-dimensional scene consisting ofmconvex polygons with a total ofnedges. We present a data structure that requiresO(mn log m) space and preprocessing time and that answers a ray shooting query inO(log2 m log2 n) time. If the polygons are pairwise disjoint, the space and preprocessing time can be improved toO((m2+n)log m) andO((m2+n log n)log m), respectively. Our algorithm also works for a collection of disjoint simple polygons. We also show that if we allow onlyO(n) space, a ray shooting query among a collection of disjoint simple polygons can be answered in timeO(m/[formula]1+ log2 n) time, for any >0.  相似文献   

16.
《Journal of Complexity》1999,15(1):30-71
We describe fast parallel algorithms for building index data structures that can be used to gather various statistics on square matrices. The main data structure is the Lsuffix tree, which is a generalization of the classical suffix tree for strings. Given ann×ntext matrixA, we build our data structures inO(log n) time withn2processors on a CRCW PRAM, so that we can quickly processAin parallel as follows: (i) report some statistical information aboutA, e.g., find the largest repeated square submatrices that appear at least twice inAor determine, for each position inA, the smallest submatrix that occurs only there; (ii) given, on-line, anm×mpattern matrixPAT, check whether it occurs inA. We refer to the above two kinds of operations as queries and point out that they have applications to visual databases and two-dimensional data compression. Query (i) takesO(log n) time withn2/log nprocessors and query (ii) takesO(log m) time withm2/log mprocessors. The query algorithms are work optimal while the construction algorithm is work optimal only for arbitrary and large alphabets.  相似文献   

17.
We consider Nash equilibria in 2‐player random games and analyze a simple Las Vegas algorithm for finding an equilibrium. The algorithm is combinatorial and always finds a Nash equilibrium; on m × n payoff matrices, it runs in time O(m2nloglog n + n2mloglog m) with high probability. Our result follows from showing that a 2‐player random game has a Nash equilibrium with supports of size two with high probability, at least 1 − O(1/log n). Our main tool is a polytope formulation of equilibria. © 2007 Wiley Periodicals, Inc. Random Struct. Alg., 2007  相似文献   

18.
Beautiful formulas are known for the expected cost of random two‐dimensional assignment problems, but in higher dimensions even the scaling is not known. In three dimensions and above, the problem has natural “Axial” and “Planar” versions, both of which are NP‐hard. For 3‐dimensional Axial random assignment instances of size n, the cost scales as Ω(1/ n), and a main result of the present paper is a linear‐time algorithm that, with high probability, finds a solution of cost O(n–1+o(1)). For 3‐dimensional Planar assignment, the lower bound is Ω(n), and we give a new efficient matching‐based algorithm that with high probability returns a solution with cost O(n log n). © 2013 Wiley Periodicals, Inc. Random Struct. Alg., 46, 160–196, 2015  相似文献   

19.
Linear matroid parity generalizes matroid intersection and graph matching (and hence network flow, degree-constrained subgraphs, etc.). A polynomial algorithm was given by Lovász. This paper presents an algorithm that uses timeO(mn 3), wherem is the number of elements andn is the rank. (The time isO(mn 2.5) using fast matrix multiplication; both bounds assume the uniform cost model). For graphic matroids the time isO(mn 2). The algorithm is based on the method of augmenting paths used in the algorithms for all subcases of the problem. First author was supported in part by the National Science Foundation under grants MCS 78-18909, MCS-8302648, and DCR-8511991. The research was done while the second author was at the University of Denver and at the University of Colorado at Boulder.  相似文献   

20.
The stable roommates problem is that of matchingn people inton/2 disjoint pairs so that no two persons, who are not paired together, both prefer each other to their respective mates under the matching. Such a matching is called a complete stable matching. It is known that a complete stable matching may not exist. Irving proposed anO(n 2) algorithm that would find one complete stable matching if there is one, or would report that none exists. Since there may not exist any complete stable matching, it is natural to consider the problem of finding a maximum stable matching, i.e., a maximum number of disjoint pairs of persons such that these pairs are stable among themselves. In this paper, we present anO(n 2) algorithm, which is a modified version of Irving's algorithm, that finds a maximum stable matching.This research was supported by National Science Council of Republic of China under grant NSC 79-0408-E009-04.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号