期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An all-substrings common subsequence algorithm

C.E.R. Alves 《Discrete Applied Mathematics》2008,156(7):1025-1035

Given two strings A and B of lengths n_a and n_b, n_a?n_b, respectively, the all-substrings longest common subsequence (ALCS) problem obtains, for every substring B^′ of B, the length of the longest string that is a subsequence of both A and B^′. The ALCS problem has many applications, such as finding approximate tandem repeats in strings, solving the circular alignment of two strings and finding the alignment of one string with several others that have a common substring. We present an algorithm to prepare the basic data structure for ALCS queries that takes O(n_an_b) time and O(n_a+n_b) space. After this preparation, it is possible to build a matrix of size that allows any LCS length to be retrieved in constant time. Some trade-offs between the space required and the querying time are discussed. To our knowledge, this is the first algorithm in the literature for the ALCS problem. 相似文献

2.

The string merging problem

Stephen Y. Itoga 《BIT Numerical Mathematics》1981,21(1):20-30

The string merging problem is to determine a merged string from a given set of strings. The distinguishing property of a solution is that the total cost of editing all of the given strings into this solution is minimal. Necessary and sufficient conditions are presented for the case where this solution matches the solution to the string-to-string correction problem. A special case where deletion is the only allowed edition operation is shown to have the longest common subsequence of the strings as its solution.This research was supported by the U.S. Army Research Office. 相似文献

3.

Semi-local longest common subsequences in subquadratic time

《Journal of Discrete Algorithms》2008,6(4):570-581

相似文献

4.

Directed acyclic subsequence graph—Overview

《Journal of Discrete Algorithms》2003,1(3-4):255-280

The subsequence matching problem is to decide, for given strings S and T, whether S is a subsequence of T. The string S is called the pattern and the string T the text. We consider the case of multiple texts and show how to solve the subsequence matching problem in time linear in the length of the pattern. For this purpose we build an automaton that accepts all subsequences of given texts. This automaton is called the Directed Acyclic Subsequence Graph (DASG). We prove an upper bound for its number of states. Furthermore, we consider a modification of the subsequence matching problem: given a string S and a finite language L, we are to decide whether S is a subsequence of any string in L. We suppose that a finite automaton accepting L is given and present an algorithm for building the DASG for language L. We also mention applications of the DASG to some problems related to subsequences. 相似文献

5.

On the Set LCS and Set-Set LCS Problems

Wang B. F. Chen G. H. Park K. 《Journal of Algorithms in Cognition, Informatics and Logic》1993,14(3)

We consider two generalizations of the longest common subsequence (LCS) problem: the Set LCS problem and the Set-Set LCS problem. We present algorithms for the two problems that are faster than the previous ones by Hirschberg and Larmore. 相似文献

6.

An Iterative Approach to Determining the Length of the Longest Common Subsequence of Two Strings

Booth Hilary S. MacNamara Shevarl F. Nielsen Ole M. Wilson Susan R. 《Methodology and Computing in Applied Probability》2004,6(4):401-421

This paper concerns the longest common subsequence (LCS) shared by two sequences (or strings) of length N, whose elements are chosen at random from a finite alphabet. The exact distribution and the expected value of the length of the LCS, k say, between two random sequences is still an open problem in applied probability. While the expected value E(N) of the length of the LCS of two random strings is known to lie within certain limits, the exact value of E(N) and the exact distribution are unknown. In this paper, we calculate the length of the LCS for all possible pairs of binary sequences from N=1 to 14. The length of the LCS and the Hamming distance are represented in color on two all-against-all arrays. An iterative approach is then introduced in which we determine the pairs of sequences whose LCS lengths increased by one upon the addition of one letter to each sequence. The pairs whose score did increase are shown in black and white on an array, which has an interesting fractal-like structure. As the sequence length increases, R(N) (the proportion of sequences whose score increased) approaches the Chvátal–Sankoff constant a _c (the proportionality constant for the linear growth of the expected length of the LCS with sequence length). We show that R(N) is converging more rapidly to a _c than E(N)/N. 相似文献

7.

A hybrid genetic algorithm for the repetition free longest common subsequence problem

Mauro Castelli Stefano Beretta Leonardo Vanneschi 《Operations Research Letters》2013

Computing the longest common subsequence of two sequences is one of the most studied algorithmic problems. In this work we focus on a particular variant of the problem, called repetition free longest common subsequence (RF-LCS

RF-LCS

), which has been proved to be NP-hard. We propose a hybrid genetic algorithm, which combines standard genetic algorithms and estimation of distribution algorithms, to solve this problem. An experimental comparison with some well-known approximation algorithms shows the suitability of the proposed technique. 相似文献

8.

Sparse Dynamic Programming for Longest Common Subsequence from Fragments

Brenda S.Baker Raffaele Giancarlo 《Journal of Algorithms in Cognition, Informatics and Logic》2002,42(2):231

Sparse Dynamic Programming has emerged as an essential tool for the design of efficient algorithms for optimization problems coming from such diverse areas as computer science, computational biology, and speech recognition. We provide a new sparse dynamic programming technique that extends the Hunt–Szymanski paradigm for the computation of the longest common subsequence (LCS) and apply it to solve the LCS from Fragments problem: given a pair of strings X and Y (of length n and m, respectively) and a set M of matching substrings of X and Y, find the longest common subsequence based only on the symbol correspondences induced by the substrings. This problem arises in an application to analysis of software systems. Our algorithm solves the problem in O(|M| log |M|) time using balanced trees, or O(|M| log log min(|M|, nm/|M|)) time using Johnson's version of Flat Trees. These bounds apply for two cost measures. The algorithm can also be adapted to finding the usual LCS in O((m + n) log |Σ| + |M| log |M|) time using balanced trees or O((m + n) log |Σ| + |M| log log min (|M|, nm/|M|)) time using Johnson's version of Flat Trees, where M is the set of maximal matches between substrings of X and Y and Σ is the alphabet. These bounds improve on those of the original Hunt–Szymanski algorithm while retaining the overall approach. 相似文献

9.

Linear time algorithm for the longest common repeat problem

Inbok Lee Costas S. Iliopoulos Kunsoo Park 《Journal of Discrete Algorithms》2007,5(2):243-249

Given a set of strings U={T₁,T₂,…,T_ℓ}, the longest common repeat problem is to find the longest common substring that appears at least twice in each string of U. We also consider reversed and reverse-complemented repeats as well as normal repeats. We present a linear time algorithm for the longest common repeat problem. 相似文献

10.

Repetition-free longest common subsequence

Said S. Adi Cristina G. Fernandes Fábio Viduani Martinez Marco A. Stefanes Yoshiko Wakabayashi 《Discrete Applied Mathematics》2010,158(12):1315-1086

We study the following problem. Given two sequences x and y over a finite alphabet, find a repetition-free longest common subsequence of x and y. We show several algorithmic results, a computational complexity result, and we describe a preliminary experimental study based on the proposed algorithms. We also show that this problem is APX-hard. 相似文献

11.

A unifying look at the Apostolico–Giancarlo string-matching algorithm

Maxime Crochemore Christophe Hancart Thierry Lecroq 《Journal of Discrete Algorithms》2003,1(1):37

String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute the combinatorial shift function (“matching shift”) of the well-known Boyer–Moore string matching algorithm. This method implies the computation of the length of the longest suffixes of the pattern ending at each position in this pattern. These values constituted an extra-preprocessing for a variant of the Boyer–Moore algorithm designed by Apostolico and Giancarlo. We give here a new presentation of this algorithm that avoids extra preprocessing together with a tight bound of 1.5n character comparisons (where n is the length of the text). 相似文献

12.

On the computational complexity of a merge recognition problem

Anthony Mansfield 《Discrete Applied Mathematics》1983,5(1):119-122

A problem arising from the work of C.A.R. Hoare on parallel programming is that of deciding whether a given string ? is a “merge” of two other given strings σ and τ. We describe a polynomial time algorithm for this problem. This algorithm can easily be extended to check, in polynomial time, whether ? is a merge of any fixed number of strings. The problem for an arbitrary number of strings is shown to be NP-complete and so is unlikely to have a polynomial time algorithm. 相似文献

13.

The universality of iterated hashing over variable-length strings

Daniel Lemire 《Discrete Applied Mathematics》2012,160(4-5):604-617

Iterated hash functions process strings recursively, one character at a time. At each iteration, they compute a new hash value from the preceding hash value and the next character. We prove that iterated hashing can be pairwise independent, but never 3-wise independent. We show that it can be almost universal over strings much longer than the number of hash values; we bound the maximal string length given the collision probability. 相似文献

14.

Practical algorithms for transposition-invariant string-matching

Kjell Lemstrm Gonzalo Navarro Yoan Pinzon 《Journal of Discrete Algorithms》2005,3(2-4):267-292

We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transposition-invariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bit-parallelism. Our branch and bound algorithm computes the longest common transposition-invariant subsequence (LCTS) in time O((m²+loglogσ)logσ) in the best case and O((m²+logσ)σ) in the worst case, where m and σ, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bit-parallelism and thus obtain a speedup of O(w/logm) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bit-parallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound method is very flexible, it can be further improved by combining it with other efficient algorithms such as our novel bit-parallel algorithm. We experiment on several combination possibilities and discuss which are the best settings for each of those combinations. Our algorithms are easily extended to other musically relevant cases, such as δ-matching and polyphony (where there are several parallel texts to be considered). We also show how our bit-parallel algorithm is adapted to text searching and illustrate its effectiveness in complex cases where the only known competing method is the use of brute force. 相似文献

15.

Repetition-free longest common subsequence

《Electronic Notes in Discrete Mathematics》2008

We study the problem of, given two finite sequences x and y, finding a repetition-free longest common subsequence of x and y. We show some algorithmic results, a complexity result, and a preliminary experimental study based on the proposed algorithms. 相似文献

16.

Faster subsequence recognition in compressed strings

A. Tiskin 《Journal of Mathematical Sciences》2009,158(5):759-769

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLP-compressed text of length , and an uncompressed pattern of length n, Cégielski et al. gave an algorithm for local subsequence recognition running in time . We improve the running time to . Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time ; the same problem with a compressed pattern is known to be NP-hard. Bibliography: 22 titles. Published in Zapiski Nauchnykh Seminarov POMI, Vol. 358, 2008, pp. 282–300. 相似文献

17.

Intuitionistic Fuzzy Automaton for Approximate String Matching

《佛山科学技术学院》2014,6(1):29-39

This paper introduces an intuitionistic fuzzy automaton model for computing the similarity between pairs of strings. The model details the possible edit operations needed to transform any input (observed) string into a target (pattern) string by providing a membership and non-membership value between them. In the end, an algorithm is given for approximate string matching and the proposed model computes the similarity and dissimilarity between the pair of strings leading to better approximation. 相似文献

18.

Lifting and recombination techniques for absolute factorization

Guillaume Chèze Grégoire Lecerf 《Journal of Complexity》2007

In the vein of recent algorithmic advances in polynomial factorization based on lifting and recombination techniques, we present new faster algorithms for computing the absolute factorization of a bivariate polynomial. The running time of our probabilistic algorithm is less than quadratic in the dense size of the polynomial to be factored. 相似文献

19.

Calculating distances for dissimilar strings: The shortest path formulation revisited

K. Spiliopoulos S. Sofianopoulou 《European Journal of Operational Research》2007

Fast detection of string differences is a prerequisite for string clustering problems. An example of such a problem is the identification of duplicate information in the data cleansing stage of the data mining process. The relevant algorithms allow the application of large-scale clustering techniques in order to create clusters of similar strings. The vast majority of comparisons, in such cases, is between very dissimilar strings, therefore methods that perform better at detecting large differences are preferable. This paper presents approaches which comply with this requirement, based on reformulation of the underlying shortest path problem. It is believed that such methods can lead to a family of new algorithms. An upper bound algorithm is presented, as an example, which produces promising results. 相似文献

20.

(Prefix) reversal distance for (signed) strings with few blocks or small alphabets

《Journal of Discrete Algorithms》2016

We study the String Reversal Distance problem, an extension of the well-known Sorting by Reversals problem. String Reversal Distance takes two strings S and T built on an alphabet Σ as input, and asks for a minimum number of reversals to obtain T from S. We consider four variants: String Reversal Distance, String Prefix Reversal Distance (a constrained version of the previous problem, in which any reversal must include the first letter of the string), and the signed variants of these problems, namely Signed String Reversal Distance and Signed String Prefix Reversal Distance. We study algorithmic properties of these four problems, in connection with two parameters of the input strings: the number of blocks they contain (a block being a maximal substring such that all letters in the substring are equal), and the alphabet size

| Σ |

. Concerning the number of blocks, we show that the four problems are fixed-parameter tractable (FPT) when the considered parameter is the maximum number of blocks among the two input strings. Concerning the alphabet size, we first show that String Reversal Distance and String Prefix Reversal Distance are NP-hard even if the input strings are built on a binary alphabet

Σ = {0, 1}

, each 0-block has length at most two and each 1-block has length one. We also show that Signed String Reversal Distance and Signed String Prefix Reversal Distance are NP-hard even if the input strings have only one letter. Finally, when

| Σ | = O (1)

, we provide a singly-exponential algorithm that computes the exact distance between any pair of strings, for a large family of distances that we call well-formed, which includes the four distances we study here. 相似文献