首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present an index that stores a text of length n such that given a pattern of length m, all the substrings of the text that are within Hamming distance (or edit distance) at most k from the pattern are reported in O(m+loglogn+#matches) time (for constant k). The space complexity of the index is O(n1+?) for any constant ?>0.  相似文献   

2.
This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.  相似文献   

3.
We present three new algorithms for on‐line multiple string matching allowing errors. These are extensions of previous algorithms that search for a single pattern. The average running time achieved is in all cases linear in the text size for moderate error level, pattern length, and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We theoretically analyze when each algorithm should be used, and show their performance experimentally. The only previous solution for this problem allows only one error. Our algorithms are the first to allow more errors, and are faster than previous work for a moderate number of patterns (e.g. less than 50–100 on English text, depending on the pattern length). © 2002 John Wiley & Sons, Inc. Random Struct. Alg., 20, 23–49, 2002  相似文献   

4.
Bit-parallel approximate string matching algorithms with transposition   总被引:1,自引:0,他引:1  
Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(km/wn) algorithm of Wu and Manber, the O((k+2)(mk)/wn) algorithm of Baeza-Yates and Navarro, and the O(m/wn) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms.  相似文献   

5.
6.
In this paper, we study the approximate string matching problem under a string distance whose edit operations are translocations of equal length factors. We extend a graph-theoretic approach proposed by Rahman and Illiopoulos (2008) to model our problem. In the sequel, we devise efficient algorithms based on this model to solve a number of variants of the string matching problem with translocations.  相似文献   

7.
8.
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time.In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design a very fast string matching algorithm. We evaluate our solution in terms of efficiency, stability and flexibility, where we propose to use the deviation in running time of an algorithm on distinct equal length patterns as a measure of stability.From our experimental results it turns out that, despite their quadratic worst case time complexity, the new presented algorithm becomes the clear winner on the average in many cases, when compared against the most recent and effective algorithms known in literature.  相似文献   

9.
We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance.  相似文献   

10.
11.
12.
We provide calculus rules for global approximate minima concerning usual operations on functions. The formulas we obtain are then applied to approximate subdifferential calculus. In this way, new results are presented, for example on the approximate subdifferential of a deconvolution, or on the subdifferential of an upper envelope of convex functions.  相似文献   

13.
The string matching with mismatches problem requires finding the Hamming distance between a pattern P of length m and every length m substring of text T with length n. Fischer and Paterson's FFT-based algorithm solves the problem without error in O(σnlogm), where σ is the size of the alphabet Σ [SIAM–AMS Proc. 7 (1973) 113–125]. However, this in the worst case reduces to O(nmlogm). Atallah, Chyzak and Dumas used the idea of randomly mapping the letters of the alphabet to complex roots of unity to estimate the score vector in time O(nlogm) [Algorithmica 29 (2001) 468–486]. We show that the algorithm's score variance can be substantially lowered by using a bijective mapping, and specifically to zero in the case of binary and ternary alphabets. This result is extended via alphabet remappings to deterministically solve the string matching with mismatches problem with a constant factor of 2 improvement over Fischer–Paterson's method.  相似文献   

14.
In this paper, we first establish a locality theory for the Noethericity of generalized boundary value problems on the spaces . By means of this theory, of the classical boundary value theory, and of the theory of Fourier analysis, we discuss the necessary and sufficient conditions of the solvability and obtain the general solutions and the Noether conditions for one class of generalized boundary value problems. All cases as regards the index of the coefficients in the equations are considered in detail. Moreover, we apply our theoretical results to the solvability of singular integral equations with variable coefficients. Thus, this paper will be of great significance for the study of improving and developing complex analysis, integral equation, and boundary value theory.  相似文献   

15.
Given a pattern string P=p1p2pm and K parallel text strings over an integer alphabet Σ, our task is to find the smallest integer κ>0 such that P can be split into κ pieces P=P1Pκ, where each Pi has an occurrence in some text track Tki and these partial occurrences retain the order. We study some variations of this minimum splitting problem, such as splittings with limited gaps and transposition invariance, and show how to use sparse dynamic programming to solve the variations efficiently. In particular, we show that the minimum splitting problem can be interpreted as a shortest path problem on line segments.  相似文献   

16.
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T. Currently, the fastest algorithms for this problem are the following. The Galil–Giancarlo algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time . We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time . We also show an algorithm that solves the above problem in time O((n+(nk3)/m)logk).  相似文献   

17.
In this paper, a string matching scheme is proposed to inspect two-dimensional objects for dimensional and shape verification in industrial environment. This approach consists of two stages. First, the procedures of determining the invariant starting point for boundary tracing and locating the cornerpoints of a curved object for polygon approximation are derived. To speed up the process, an optimization-based unconstrained line search method is used to locate the cornerpoints of the polygon image of a curved object. These cornerpoints are then recorded as feature string. At last, the feature string for each tested object are utilized to find the exact correspondence to one of several model objects. In contrast to conventional matching methods, which requires translation and rotation of the tested image before matching, the proposed method proves to be computationally efficient for real-time applications.  相似文献   

18.
In the context of algorithmic parameter optimization, there is much room for efficient usage of computational resources. We consider the Opal framework in which a nonsmooth optimization problem models the parameter identification task, and is solved by a mesh adaptive direct search solver. Each evaluation of trial parameters requires the processing of a potentially large number of independent tasks. We describe and evaluate several strategies for using parallelism in this setting. Our test scenario consists in optimizing five parameters of a trust-region method for smooth unconstrained minimization.  相似文献   

19.
Asymptotics are found of the exact upper bound of the maximum of the derivatives on an interval for a class of algebraic polynomials of a specified degree which possess a majorization on a given set of points in the basic segment. Application to approximate differentiation is given.Translated from Matematicheskie Zametki, Vol. 5, No. 2, pp. 183–194, February, 1969.  相似文献   

20.
A new formulation for the action of the Nambu-Goto string moving in curved space-time is studied. The action contains a two-dimensional “cosmological” world-sheet term multiplied by a tension parameter. It is shown that when treating this cosmological term as a perturbation, nonlinear equations for the string can be linearized. It is argued that in the zeroth-order approximation with respect to the tension, strings and membranes can be considered as dominant gravitation sources for the Friedmann universe with κ=0. Translated from Teoreticheskaya i Matematicheskaya Fizika, Vol. 111, No. 3, pp. 402–412, June, 1997.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号