首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
The edit distance problem for rooted unordered trees is known to be NP-hard. Based on this fact, this paper studies exponential-time algorithms for the problem. For a general case, an O(min(1.26n1+n2,2b1+b2poly(n1,n2))) time algorithm is presented, where n1 and n2 are the numbers of nodes and b1 and b2 are the numbers of branching nodes in two input trees. This algorithm is obtained by a combination of dynamic programming, exhaustive search, and maximum weighted bipartite matching. For bounded degree trees over a fixed alphabet, it is shown that the problem can be solved in O((1+ϵ)n1+n2) time for any fixed ϵ>0. This result is achieved by avoiding duplicate calculations for identical subsets of small subtrees.  相似文献   

3.
Bit-parallel approximate string matching algorithms with transposition   总被引:1,自引:0,他引:1  
Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(km/wn) algorithm of Wu and Manber, the O((k+2)(mk)/wn) algorithm of Baeza-Yates and Navarro, and the O(m/wn) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms.  相似文献   

4.
5.
6.
7.
We present a new variant of the suffix tree called a distributed suffix tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for distributed memory parallel computing environments (e.g., Beowulf clusters). The central advantage is that standard operations of biological importance on suffix trees are shown to be easily translatable to this new data structure. While none of these operations on the DST require inter-process communication, many have optimal expected parallel running times.  相似文献   

8.
In this paper we consider the incremental/decremental version of the edit distance problem: given a solution to the edit distance between two strings A and B, find a solution to the edit distance between A and B′ where B′=aB (incremental) or bB′=B (decremental). As a solution for the edit distance between A and B, we define the difference representation of the D-table, which leads to a simple and intuitive algorithm for the incremental/decremental edit distance problem.  相似文献   

9.
In this paper, we study the approximate string matching problem under a string distance whose edit operations are translocations of equal length factors. We extend a graph-theoretic approach proposed by Rahman and Illiopoulos (2008) to model our problem. In the sequel, we devise efficient algorithms based on this model to solve a number of variants of the string matching problem with translocations.  相似文献   

10.
We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the index points out the text areas that could contain occurrences and must be verified. The index parameters permit load balancing between filtering and verification work, and provide a compromise between the space requirement of the index and the error level for which the filtration is still efficient. We show experimentally that the index is competitive against others that take more space, being in fact the fastest choice for intermediate error levels, an area where no current index is useful.  相似文献   

11.
(δ,γ)-matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P1…m and a text T1…n on an alphabet of integers, find the occurrences P of the pattern in the text such that (i) , and (ii) . The problem makes sense for δγδm. Several techniques for (δ,γ)-matching have been proposed, based on bit-parallelism or on skipping characters. We first present an O(mnlog(γ)/w) worst-case time and O(n) average-case time bit-parallel algorithm (being w the number of bits in the computer word). It improves the previous O(mnlog(δm)/w) worst-case time algorithm of the same type. Second, we combine our bit-parallel algorithm with suffix automata to obtain the first algorithm that skips characters using both δ and γ. This algorithm examines less characters than any previous approach, as the others do just δ-matching and check the γ-condition on the candidates. We implemented our algorithms and drew experimental results on real music, showing that our algorithms are superior to current alternatives with high values of δ.  相似文献   

12.
This paper introduces an intuitionistic fuzzy automaton model for computing the similarity between pairs of strings. The model details the possible edit operations needed to transform any input (observed) string into a target (pattern) string by providing a membership and non-membership value between them. In the end, an algorithm is given for approximate string matching and the proposed model computes the similarity and dissimilarity between the pair of strings leading to better approximation.  相似文献   

13.
The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T. Currently, the fastest algorithms for this problem are the following. The Galil–Giancarlo algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time . We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time . We also show an algorithm that solves the above problem in time O((n+(nk3)/m)logk).  相似文献   

14.
15.
For a string A=a1an, a reversalρ(i,j), 1?i?j?n, transforms the string A into a string A=a1ai-1ajaj-1aiaj+1an, that is, the reversal ρ(i,j) reverses the order of symbols in the substring aiaj of A. In the case of signed strings, where each symbol is given a sign + or -, the reversal operation also flips the sign of each symbol in the reversed substring. Given two strings, A and B, signed or unsigned, sorting by reversals (SBR) is the problem of finding the minimum number of reversals that transform the string A into the string B.Traditionally, the problem was studied for permutations, that is, for strings in which every symbol appears exactly once. We consider a generalization of the problem, k-SBR, and allow each symbol to appear at most k times in each string, for some k?1. The main result of the paper is an O(k2)-approximation algorithm running in time O(n). For instances with , this is the best known approximation algorithm for k-SBR and, moreover, it is faster than the previous best approximation algorithm.  相似文献   

16.
17.
Space efficient linear time construction of suffix arrays   总被引:1,自引:0,他引:1  
We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result is one of the first linear time suffix array construction algorithms, which improve upon the previously known O(nlogn) time direct algorithms for suffix sorting. It can also be used to derive a different linear time construction algorithm for suffix trees. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient.  相似文献   

18.
In dendrochronology wood samples are dated according to the tree rings they contain. The dating process consists of comparing the sequence of tree ring widths in the sample to a dated master sequence. Assuming that a tree forms exactly one ring per year a simple sliding algorithm solves this matching task.

But sometimes a tree produces no ring or even two rings in a year. If a sample sequence contains this kind of inconsistencies it cannot be dated correctly by the simple sliding algorithm. We therefore introduce a algorithm for dating such a sample sequence against an error-free master sequence, where n and m are the lengths of the sequences. Our algorithm takes into account that the sample might contain up to missing or double rings and suggests possible positions for these kind of inconsistencies. This is done by employing an edit distance as the distance measure.  相似文献   


19.
We compare two popular scenario tree generation methods in the context of financial optimization: moment matching and scenario reduction. Using a simple problem with a known analytic solution, moment matching–when ensuring absence of arbitrage–replicates this solution precisely. On the other hand, even if the scenario trees generated by scenario reduction are arbitrage-free, the solutions are biased and highly variable. These results hold for correlated and uncorrelated asset returns, as well as for normal and non-normal returns.  相似文献   

20.
We propose a process for determining approximated matches, in terms of the bottleneck distance, under color preserving rigid motions, between two colored point sets A,BR2, |A|≤|B|. We solve the matching problem by generating all representative motions that bring A close to a subset B of set B and then using a graph matching algorithm. We also present an approximate matching algorithm with improved computational time. In order to get better running times for both algorithms we present a lossless filtering preprocessing step. By using it, we determine some candidate zones which are regions that contain a subset S of B such that A may match one or more subsets B of S. Then, we solve the matching problem between A and every candidate zone. Experimental results using both synthetic and real data are reported to prove the effectiveness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号