首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We show how to determine whether a given pattern p of length m occurs in a given text t of length n in time (where allows for logarithmic factors in m and n/m) with inverse polynomial failure probability. This algorithm combines quantum searching algorithms with a technique from parallel string matching, called Deterministic Sampling.  相似文献   

2.
Average-optimal string matching   总被引:2,自引:0,他引:2  
The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads to very simple algorithms that are optimal on average. We then show how our basic method can be used to solve multiple string matching as well as several approximate matching problems in average optimal time. The general method can be applied to many existing string matching algorithms. Our experimental results show that the algorithms perform very well in practice.  相似文献   

3.
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv–Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to kinsertions, deletions and substitutions. On LZ78/LZW we need O(mkn+R) time in the worst case and O(k2n+mkmin(n,(mσ)k)+R) on average where σ is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.  相似文献   

4.
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T. Currently, the fastest algorithms for this problem are the following. The Galil–Giancarlo algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time . We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time . We also show an algorithm that solves the above problem in time O((n+(nk3)/m)logk).  相似文献   

5.
Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. Condensed neighborhoods, however, are not a minimal representation of a pattern neighborhood. Super condensed neighborhoods, proposed in this work, are smaller, provably minimal and can be used to locate approximate matches that can later be extended by on-line search. We present an algorithm for generating Super Condensed Neighborhoods. The algorithm can be implemented either by using dynamic programming or non-deterministic automata. The time complexity is O(ms) for the first case and O(kms) for the second, where m is the pattern size, s is the size of the super condensed neighborhood and k the number of errors. Previous algorithms depended on the size of the condensed neighborhood instead. These algorithms can be implemented using Bit-Parallelism and Increased Bit-Parallelism techniques. Our experimental results show that the resulting algorithms are fast and achieve significant speedups, when compared with the existing proposals that use condensed neighborhoods.  相似文献   

6.
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time.In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design a very fast string matching algorithm. We evaluate our solution in terms of efficiency, stability and flexibility, where we propose to use the deviation in running time of an algorithm on distinct equal length patterns as a measure of stability.From our experimental results it turns out that, despite their quadratic worst case time complexity, the new presented algorithm becomes the clear winner on the average in many cases, when compared against the most recent and effective algorithms known in literature.  相似文献   

7.
Quasi-Newton algorithms minimize a functionF(x),xR n, searching at any iterationk along the directions k=?H kgk, whereg k=?F(x k) andH k approximates in some sense the inverse Hessian ofF(x) atx k. When the matrixH is updated according to the formulas in Broyden's family and when an exact line search is performed at any iteration, a compact algorithm (free from the Broyden's family parameter) can be conceived in terms of the followingn ×n matrix: $$H{_R} = H - Hgg{^T} H/g{^T} Hg,$$ which can be viewed as an approximating reduced inverse Hessian. In this paper, a new algorithm is proposed which uses at any iteration an (n?1)×(n?1) matrixK related toH R by $$H_R = Q\left[ {\begin{array}{*{20}c} 0 & 0 \\ 0 & K \\ \end{array} } \right]Q$$ whereQ is a suitable orthogonaln×n matrix. The updating formula in terms of the matrixK incorporated in this algorithm is only moderately more complicated than the standard updating formulas for variable-metric methods, but, at the same time, it updates at any iteration a positive definite matrixK, instead of a singular matrixH R. Other than the compactness with respect to the algorithms with updating formulas in Broyden's class, a further noticeable feature of the reduced Hessian algorithm is that the downhill condition can be stated in a simple way, and thus efficient line searches may be implemented.  相似文献   

8.
在基因工程中,经常需要在一个较长的DNA链中寻找一小段DNA片段.本文提出了一个新的匹配算法使得当对一个长为n的DNA链t进行检索时,在最坏的情况下克只需要比较n次就能找到一个预先给定的长为m的DNA片段户在t中所有出现的地方.而且对该算法稍加改动即可用于一般的关键字搜索(或称串匹配).在同类算法中,该算法可能是迄今为止最有效的.  相似文献   

9.
String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute the combinatorial shift function (“matching shift”) of the well-known Boyer–Moore string matching algorithm. This method implies the computation of the length of the longest suffixes of the pattern ending at each position in this pattern. These values constituted an extra-preprocessing for a variant of the Boyer–Moore algorithm designed by Apostolico and Giancarlo. We give here a new presentation of this algorithm that avoids extra preprocessing together with a tight bound of 1.5n character comparisons (where n is the length of the text).  相似文献   

10.
Two words are called k-abelian equivalent, if they share the same multiplicities for all factors of length at most k. We present an optimal linear time algorithm for identifying all occurrences of factors in a text that are k-abelian equivalent to some pattern P. Moreover, an optimal algorithm for finding the largest k for which two words are k-abelian equivalent is given. Solutions for online versions of the k-abelian pattern matching problem are also proposed.  相似文献   

11.
Any nonempty string of the form xx is called a repetition. An O(n log n) algorithm is presented to find all repetitions in a string of lenght n. The algorithm is based on a linear algorithm to find all the new repetitions formed when two strings are concatenated. This linear algorithm is possible because new repetitions of equal length must occur in blocks with consecutive starting positions. The linear algorithm uses a variation of the Knuth-Morris-Pratt algorithm to find all partial occurrences of a pattern within a text string. It is also shown that no algorithm based on comparisons of symbols can improve O(n log n). Finally, some open problems and applications are suggested.  相似文献   

12.
We give first the representation of a suffix tree that uses n lg n + O(n) bits of space and supports searching for a pattern string in the given text (from a fixed size alphabet) in O(m) time, where n is the size of the text and m is the length of the pattern. The structure is quite simple and answers a question raised by Muthukrishnan (in Proceedings of the FST and TCS Preconference Workshop on Randomization, 1997, pp. 23–27). Previous compact representations of suffix trees had either a higher lower order term in space and had some expectation assumption or required more time for searching. When the size of the alphabet k is not viewed as a constant, this structure can be modified to use the same space but take O(m lg k) time for string searching or to use an additional O(n lg k) bits but take the same O(m) time for searching. We then give several index structures for binary texts, with less space including
• a structure that uses a suffix array (lg  bits) and an additional () bits,
• an indexing structure that takes bits of space, and
• an ( lg ) bit structure which answers in () time, the decision question of whether a given pattern of length occurs in the text.
Each of these structures uses a different technique, either in the storage scheme or in the search algorithm, in reducing the space requirement. The first one uses a suffix array, a sparse suffix tree, and a table structure. Finding all the occurrences of a pattern using this structure takes O(m + s) time, where s is the number of occurrences of the pattern in the text. The second structure constructs a sparse suffix tree for all the suffixes that start with the bit that occurs more times in the given binary text. The last structure uses an iterative algorithm to search for the pattern. This structure is the first o(n lg n) bit index to support the decision version of indexing queries in time linear in the length of the pattern. But this does not support the general indexing queries where we want to find the position of the occurrence of the pattern.Our main contribution is the development of techniques to use the succinct tree representation through balanced parentheses for suffix trees.  相似文献   

13.
Efficient algorithms are given to find the maximum lengthn of an ordered list in which 4 elements can be merged using exactlyk comparisons. A top down algorithm for the (2,n) merge problem is discussed and is shown to obtain the optimal merge length first reported by Hwang and Lin. Our algorithms combine this top down approach and strong heuristics, some of which derived from Hwang's optimal algorithm for the (3,n) problem, and produce a lengthn which is close to the optimal lengthf 4(k).  相似文献   

14.
Bioinformatics, the discipline which studies the computational problems arising from molecular biology, poses many interesting problems to the string searching community. We will describe two problems arising from Bioinformatics, their preliminary solutions, and the more general problem that they pose. The first problem is searching for α-helices in protein sequences. This particular instance of the search is based on matching of hydrophobicity/hydrophilicity. We find an algorithm which is linear in the sequence length for fixed helix length and is O(nlogn) for any helix length. The second problem is on matching probabilistic sequences against sequences or against other probabilistic sequences. In both cases we derive efficient formulas to compute scores according to a Markovian model of evolution.  相似文献   

15.
For a string A=a1an, a reversalρ(i,j), 1?i?j?n, transforms the string A into a string A=a1ai-1ajaj-1aiaj+1an, that is, the reversal ρ(i,j) reverses the order of symbols in the substring aiaj of A. In the case of signed strings, where each symbol is given a sign + or -, the reversal operation also flips the sign of each symbol in the reversed substring. Given two strings, A and B, signed or unsigned, sorting by reversals (SBR) is the problem of finding the minimum number of reversals that transform the string A into the string B.Traditionally, the problem was studied for permutations, that is, for strings in which every symbol appears exactly once. We consider a generalization of the problem, k-SBR, and allow each symbol to appear at most k times in each string, for some k?1. The main result of the paper is an O(k2)-approximation algorithm running in time O(n). For instances with , this is the best known approximation algorithm for k-SBR and, moreover, it is faster than the previous best approximation algorithm.  相似文献   

16.
We consider a family of generalized matching problems called k-feasible matching (k-RM) problems, where k? {1,2,3,…} ∪ {∞}. We show each k-FM problem to be NP-complete even for very restricted cases. We develop a dynamic programming algorithm that solves in polynomial time the k-FM problem for graphs with width bounded by 2k. We also show that for any subset S of {1,2,…} ∪ {∞}, there is a set D of problem instances such that for k in S the k-FM problem is NP-complete on D, while for k not in S the k-FM problem is polynomially solvable on D.  相似文献   

17.
An approach for factoring general boolean functions was described in Golumbic and Mintz [Factoring logic functions using graph partitioning, in: Proceedings of IEEE/ACM International Conference on Computer Aided Design, November 1999, pp. 195-198] and Mintz and Golumbic [Factoring Boolean functions using graph partitioning, Discrete Appl. Math. 149 (2005) 131-153] which is based on graph partitioning algorithms. In this paper, we present a very fast algorithm for recognizing and factoring read-once functions which is needed as a dedicated factoring subroutine to handle the lower levels of that factoring process. The algorithm is based on algorithms for cograph recognition and on checking normality.For non-read-once functions, we investigate their factoring based on their corresponding graph classes. In particular, we show that if a function F is normal and its corresponding graph is a partial k-tree, then F is a read 2k function and a read 2k formula for F can be obtained in polynomial time.  相似文献   

18.
The paper gives a new interpretation and a possible optimization of the wellknown k-means algorithm for searching for a locally optimal partition of the set A = {a i ∈ ? n : i = 1, …, m} which consists of k disjoint nonempty subsets π1, …, π k , 1 ? k ? m. For this purpose, a new divided k-means algorithm was constructed as a limit case of the known smoothed k-means algorithm. It is shown that the algorithm constructed in this way coincides with the k-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided k-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.  相似文献   

19.
Motivated by the gateway placement problem in wireless networks, we consider the geometric k-centre problem on unit disc graphs: given a set of points P in the plane, find a set F of k points in the plane that minimizes the maximum graph distance from any vertex in P to the nearest vertex in F in the unit disc graph induced by PF. We show that the vertex 1-centre provides a 7-approximation of the geometric 1-centre and that a vertex k-centre provides a 13-approximation of the geometric k-centre, resulting in an O(kn)-time 26-approximation algorithm. We describe O(n2m)-time and O(n3)-time algorithms, respectively, for finding exact and approximate geometric 1-centres, and an O(mn2k)-time algorithm for finding a geometric k-centre for any fixed k. We show that the problem is NP-hard when k is an arbitrary input parameter. Finally, we describe an O(n)-time algorithm for finding a geometric k-centre in one dimension.  相似文献   

20.
For any zero-nonzero pattern of a matrix, the minimum possible rank is at least the size of a sub-pattern that is permutation equivalent to a triangular pattern with nonzero diagonal. For certain numbers of rows and columns, the minimum rank of a pattern is k only when there is a k-by-k such triangle. Here, we complete the determination of such sizes by showing that an m-by-n pattern of minimum rank k must contain a k-triangle for m=5, k=4; m=6, k=5; and m=6, k=4. A table is given showing whether or not this happens for all m, n, k. In the process, a Schur complement approach to minimum rank is described and used, and simple ways to recognize the presence of triangles of sizes less than 7 are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号