首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Given two strings A and B of lengths na and nb, na?nb, respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B of B, the length of the longest string that is a subsequence of both A and B. The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O(nanb) time and O(na+nb) space. After this preparation, it is possible to build a matrix of size that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem.  相似文献   

2.
The string merging problem is to determine a merged string from a given set of strings. The distinguishing property of a solution is that the total cost of editing all of the given strings into this solution is minimal. Necessary and sufficient conditions are presented for the case where this solution matches the solution to the string-to-string correction problem. A special case where deletion is the only allowed edition operation is shown to have the longest common subsequence of the strings as its solution.This research was supported by the U.S. Army Research Office.  相似文献   

3.
4.
The subsequence matching problem is to decide, for given strings S and T, whether S is a subsequence of T. The string S is called the pattern and the string T the text. We consider the case of multiple texts and show how to solve the subsequence matching problem in time linear in the length of the pattern. For this purpose we build an automaton that accepts all subsequences of given texts. This automaton is called the Directed Acyclic Subsequence Graph (DASG). We prove an upper bound for its number of states. Furthermore, we consider a modification of the subsequence matching problem: given a string S and a finite language L, we are to decide whether S is a subsequence of any string in L. We suppose that a finite automaton accepting L is given and present an algorithm for building the DASG for language L. We also mention applications of the DASG to some problems related to subsequences.  相似文献   

5.
We consider two generalizations of the longest common subsequence (LCS) problem: the Set LCS problem and the Set-Set LCS problem. We present algorithms for the two problems that are faster than the previous ones by Hirschberg and Larmore.  相似文献   

6.
This paper concerns the longest common subsequence (LCS) shared by two sequences (or strings) of length N, whose elements are chosen at random from a finite alphabet. The exact distribution and the expected value of the length of the LCS, k say, between two random sequences is still an open problem in applied probability. While the expected value E(N) of the length of the LCS of two random strings is known to lie within certain limits, the exact value of E(N) and the exact distribution are unknown. In this paper, we calculate the length of the LCS for all possible pairs of binary sequences from N=1 to 14. The length of the LCS and the Hamming distance are represented in color on two all-against-all arrays. An iterative approach is then introduced in which we determine the pairs of sequences whose LCS lengths increased by one upon the addition of one letter to each sequence. The pairs whose score did increase are shown in black and white on an array, which has an interesting fractal-like structure. As the sequence length increases, R(N) (the proportion of sequences whose score increased) approaches the Chvátal–Sankoff constant a c (the proportionality constant for the linear growth of the expected length of the LCS with sequence length). We show that R(N) is converging more rapidly to a c than E(N)/N.  相似文献   

7.
Computing the longest common subsequence of two sequences is one of the most studied algorithmic problems. In this work we focus on a particular variant of the problem, called repetition free longest common subsequence (RF-LCSRF-LCS), which has been proved to be NP-hard. We propose a hybrid genetic algorithm, which combines standard genetic algorithms and estimation of distribution algorithms, to solve this problem. An experimental comparison with some well-known approximation algorithms shows the suitability of the proposed technique.  相似文献   

8.
Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as computer science, computational biology, and speech recognition. We provide a new sparse dynamic programming technique that extends the Hunt–Szymanski paradigm for the computation of the longest common subsequence (LCS) and apply it to solve the LCS from Fragments problem: given a pair of strings X and Y (of length n and m, respectively) and a set M of matching substrings of X and Y, find the longest common subsequence based only on the symbol correspondences induced by the substrings. This problem arises in an application to analysis of software systems. Our algorithm solves the problem in O(|M| log |M|) time using balanced trees, or O(|M| log log min(|M|, nm/|M|)) time using Johnson's version of Flat Trees. These bounds apply for two cost measures. The algorithm can also be adapted to finding the usual LCS in O((m + n) log |Σ| + |M| log |M|) time using balanced trees or O((m + n) log |Σ| + |M| log log min (|M|, nm/|M|)) time using Johnson's version of Flat Trees, where M is the set of maximal matches between substrings of X and Y and Σ is the alphabet. These bounds improve on those of the original Hunt–Szymanski algorithm while retaining the overall approach.  相似文献   

9.
Given a set of strings U={T1,T2,…,T}, the longest common repeat problem is to find the longest common substring that appears at least twice in each string of U. We also consider reversed and reverse-complemented repeats as well as normal repeats. We present a linear time algorithm for the longest common repeat problem.  相似文献   

10.
We study the following problem. Given two sequences x and y over a finite alphabet, find a repetition-free longest common subsequence of x and y. We show several algorithmic results, a computational complexity result, and we describe a preliminary experimental study based on the proposed algorithms. We also show that this problem is APX-hard.  相似文献   

11.
String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute the combinatorial shift function (“matching shift”) of the well-known Boyer–Moore string matching algorithm. This method implies the computation of the length of the longest suffixes of the pattern ending at each position in this pattern. These values constituted an extra-preprocessing for a variant of the Boyer–Moore algorithm designed by Apostolico and Giancarlo. We give here a new presentation of this algorithm that avoids extra preprocessing together with a tight bound of 1.5n character comparisons (where n is the length of the text).  相似文献   

12.
A problem arising from the work of C.A.R. Hoare on parallel programming is that of deciding whether a given string ? is a “merge” of two other given strings σ and τ. We describe a polynomial time algorithm for this problem. This algorithm can easily be extended to check, in polynomial time, whether ? is a merge of any fixed number of strings. The problem for an arbitrary number of strings is shown to be NP-complete and so is unlikely to have a polynomial time algorithm.  相似文献   

13.
Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability.  相似文献   

14.
Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLP-compressed text of length , and an uncompressed pattern of length n, Cégielski et al. gave an algorithm for local subsequence recognition running in time . We improve the running time to . Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time ; the same problem with a compressed pattern is known to be NP-hard. Bibliography: 22 titles. Published in Zapiski Nauchnykh Seminarov POMI, Vol. 358, 2008, pp. 282–300.  相似文献   

15.
We study the problem of, given two finite sequences x and y, finding a repetition-free longest common subsequence of x and y. We show some algorithmic results, a complexity result, and a preliminary experimental study based on the proposed algorithms.  相似文献   

16.
This paper introduces an intuitionistic fuzzy automaton model for computing the similarity between pairs of strings. The model details the possible edit operations needed to transform any input (observed) string into a target (pattern) string by providing a membership and non-membership value between them. In the end, an algorithm is given for approximate string matching and the proposed model computes the similarity and dissimilarity between the pair of strings leading to better approximation.  相似文献   

17.
In the vein of recent algorithmic advances in polynomial factorization based on lifting and recombination techniques, we present new faster algorithms for computing the absolute factorization of a bivariate polynomial. The running time of our probabilistic algorithm is less than quadratic in the dense size of the polynomial to be factored.  相似文献   

18.
We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transposition-invariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bit-parallelism. Our branch and bound algorithm computes the longest common transposition-invariant subsequence (LCTS) in time O((m2+loglogσ)logσ) in the best case and O((m2+logσ)σ) in the worst case, where m and σ, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bit-parallelism and thus obtain a speedup of O(w/logm) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bit-parallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound method is very flexible, it can be further improved by combining it with other efficient algorithms such as our novel bit-parallel algorithm. We experiment on several combination possibilities and discuss which are the best settings for each of those combinations. Our algorithms are easily extended to other musically relevant cases, such as δ-matching and polyphony (where there are several parallel texts to be considered). We also show how our bit-parallel algorithm is adapted to text searching and illustrate its effectiveness in complex cases where the only known competing method is the use of brute force.  相似文献   

19.
Fast detection of string differences is a prerequisite for string clustering problems. An example of such a problem is the identification of duplicate information in the data cleansing stage of the data mining process. The relevant algorithms allow the application of large-scale clustering techniques in order to create clusters of similar strings. The vast majority of comparisons, in such cases, is between very dissimilar strings, therefore methods that perform better at detecting large differences are preferable. This paper presents approaches which comply with this requirement, based on reformulation of the underlying shortest path problem. It is believed that such methods can lead to a family of new algorithms. An upper bound algorithm is presented, as an example, which produces promising results.  相似文献   

20.
Let X = X 1 ... X n and Y = Y 1 ... Y n be two binary sequences with length n. A common subsequence of X and Y is any subsequence of X that at the same time is a subsequence of Y; The common subsequence with maximal length is called the longest common subsequence (LCS) of X and Y. LCS is a common tool for measuring the closeness of X and Y. In this note, we consider the case when X and Y are both i.i.d. Bernoulli sequences with the parameters ϵ and 1 − ϵ, respectively. Hence, typically the sequences consist of large and short blocks of different colors. This gives an idea to the so-called block-by-block alignment, where the short blocks in one sequence are matched to the long blocks of the same color in another sequence. Such and alignment is not necessarily a LCS, but it is computationally easy to obtain and, therefore, of practical interest. We investigate the asymptotical properties of several block-by-block type of alignments. The paper ends with the simulation study, where the of block-by-block type of alignments are compared with the LCS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号