首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
We study the String Reversal Distance problem, an extension of the well-known Sorting by Reversals problem. String Reversal Distance takes two strings S and T built on an alphabet Σ as input, and asks for a minimum number of reversals to obtain T from S. We consider four variants: String Reversal Distance, String Prefix Reversal Distance (a constrained version of the previous problem, in which any reversal must include the first letter of the string), and the signed variants of these problems, namely Signed String Reversal Distance and Signed String Prefix Reversal Distance. We study algorithmic properties of these four problems, in connection with two parameters of the input strings: the number of blocks they contain (a block being a maximal substring such that all letters in the substring are equal), and the alphabet size |Σ|. Concerning the number of blocks, we show that the four problems are fixed-parameter tractable (FPT) when the considered parameter is the maximum number of blocks among the two input strings. Concerning the alphabet size, we first show that String Reversal Distance and String Prefix Reversal Distance are NP-hard even if the input strings are built on a binary alphabet Σ={0,1}, each 0-block has length at most two and each 1-block has length one. We also show that Signed String Reversal Distance and Signed String Prefix Reversal Distance are NP-hard even if the input strings have only one letter. Finally, when |Σ|=O(1), we provide a singly-exponential algorithm that computes the exact distance between any pair of strings, for a large family of distances that we call well-formed, which includes the four distances we study here.  相似文献   

2.
This paper introduces an intuitionistic fuzzy automaton model for computing the similarity between pairs of strings. The model details the possible edit operations needed to transform any input (observed) string into a target (pattern) string by providing a membership and non-membership value between them. In the end, an algorithm is given for approximate string matching and the proposed model computes the similarity and dissimilarity between the pair of strings leading to better approximation.  相似文献   

3.
The consensus string problem for a metric is NP-complete   总被引:1,自引:0,他引:1  
Given a set S of strings, a consensus string of S based on consensus error is a string w that minimizes the sum of the distances between w and all the strings in S. In this paper, we show that the problem of finding a consensus string based on consensus error is NP-complete when the penalty matrix is a metric.  相似文献   

4.
The support vector machine (SVM) is a powerful learning algorithm, e.g., for classification and clustering tasks, that works even for complex data structures such as strings, trees, lists and general graphs. It is based on the usage of a kernel function for measuring scalar products between data units. For analyzing string data Lodhi et al. (J Mach Learn Res 2:419–444, 2002) have introduced a String Subsequence kernel (SSK). In this paper we propose an approximation to SSK based on dropping higher orders terms (i.e., subsequences which are spread out more than a certain threshold) that reduces the computational burden of SSK. As we are also concerned with practical application of complex kernels with high computational complexity and memory consumption, we provide an empirical model to predict runtime and memory of the approximation as well as the original SSK, based on easily measurable properties of input data. We provide extensive results on the properties of the proposed approximation, SSK-LP, with respect to prediction accuracy, runtime and memory consumption. Using some real-life datasets of text mining tasks, we show that models based on SSK and SSK-LP perform similarly for a set of real-life learning tasks, and that the empirical runtime model is also useful in roughly determining total learning time for a SVM using either kernel.  相似文献   

5.
Given two strings, the longest common subsequence (LCS) problem consists in computing the length of the longest string that is a subsequence of both input strings. Its generalisation, the all semi-local LCS problem, requires computing the LCS length for each string against all substrings of the other string, and for all prefixes of each string against all suffixes of the other string. We survey a number of algorithmic techniques related to the all semi-local LCS problem. We then present a number of algorithmic applications of these techniques, both existing and new. In particular, we obtain a new all semi-local LCS algorithm, with asymptotic running time matching (in the case of an unbounded alphabet) the fastest known global LCS algorithm by Masek and Paterson. We conclude that semi-local string comparison turns out to be a useful algorithmic plug-in, which unifies, and often improves on, a number of previous approaches to various substring- and subsequence-related problems. The author acknowledges the support of The University of Warwick’s DIMAP (the Centre for Discrete Mathematics and its Applications) during this work.  相似文献   

6.
One of the most beautiful and useful notions in the Mathematical Theory of Strings is that of a Period, i.e., an initial piece of a given string that can generate that string by repeating itself at regular intervals. Periods have an elegant mathematical structure and a wealth of applications [F. Mignosi and A. Restivo, Periodicity, Algebraic Combinatorics on Words, in: M. Lothaire (Ed.), Cambridge University Press, Cambridge, pp. 237–274, 2002]. At the hearth of their theory, there are two Periodicity Lemmas: one due to Lyndon and Schutzenberger [The equation aM=bNcP in a free group, Michigan Math. J. 9 (1962) 289–298], referred to as the Weak Version, and the other due to Fine and Wilf [Uniqueness theorems for periodic functions, Proc. Amer. Math. Soc. 16 (1965) 109–114]. In this paper, we investigate the notion of periodicity and the closely related one of repetition in connection with parameterized strings as introduced by Baker [Parameterized pattern matching: algorithms and applications, J. Comput. System Sci. 52(1) (1996) 28–42; Parameterized duplication in strings: algorithms and an application to software maintenance, SIAM J. Comput. 26(5) (1997) 1343–1362]. In such strings, the notion of pairwise match or “equivalence” of symbols is more relaxed than the usual one, in that it rests on some mapping, rather than identity, of symbols. It seems natural to try and extend notions of periods and periodicities to encompass parameterized strings. However, we know of no previous attempt in this direction. Our preliminary investigation yields results as follows. For periodicity, we get (a) a generalization of the Weak Version of the Periodicity Lemma for parameterized strings, showing that it is essential that the two mappings inducing the periodicity must commute; (b) a proof that an analogous of the Lemma by Fine and Wilf [Uniqueness theorems for periodic functions, Proc. Amer. Math. Soc. 16 (1965) 109–114] cannot hold for parameterized strings, even if the mappings inducing the periodicity “commute”, in a sense to be specified below; (c) a proof that parameterized strings over an alphabet of at least three letters may have a set of periods which differ from those of any binary string of the same length—whereby the parameterized analog of a classic result by Guibas and Odlyzko [String overlaps, pattern matching, and nontransitive games, J. Combin. Theory Ser. A 30 (1981) 183–208] cannot hold. We also derive necessary and sufficient conditions characterizing parameterized repetitions, which are patterns of length at least twice that of the period, and show how the notion of root differs from the standard case, and highlight some of the implications on extending algorithmic criteria previously adopted for string searching, detection of repetitions and the likes. Finally, as a corollary of our main results, we also show that binary parameterized strings behave much in the same way as non-parameterized ones with respect to periodicity and repetitions, while there is a substantial difference for strings over alphabets of at least three symbols.  相似文献   

7.
The reductive decision procedure for unavoidable strings was recently shown to have an exponential lower bound. Hence, as a special case of generalized pattern matching, the existence of an efficient algorithm deciding string unavoidability remains an interesting open question. It has been hypothesized that some combination of the four necessary conditions implied by the known decidability results would be sufficient. Three of these criteria are determined in polynomial time, and the fourth provides the needed recursion. In this paper, however, we demonstrate the existence of arbitrarily many avoidable strings meeting any extended conjunction of the four necessary conditions. These insufficiency results are achieved by analyzing the appropriate graphical interpretations of the given algorithms. We provide a new combinatorial operation on the corresponding strings and generate arbitrary counterexamples from an empirically located minimal set. Thus, string unavoidability cannot be efficiently decided by the known reductive method or its immediate implications.  相似文献   

8.
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. In this paper, we present an algorithm called Nimbus for filtering strings prior to finding repetitions occurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with Nimbus a data set where one wants to find functional elements using a multiple local alignment tool such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.  相似文献   

9.
《Journal of Complexity》1999,15(1):128-147
Most research on the edit distance problem and thek-differences problem considered the set of edit operations consisting of changes, insertions, and deletions. In this paper we include theswapoperation that interchanges two adjacent characters into the set of allowable edit operations, and we present anO(t min(m, n))-time algorithm for the extended edit distance problem, wheretis the edit distance between the given strings, and anO(kn)-time algorithm for the extendedk-differences problem. That is, we add swaps into the set of edit operations without increasing the time complexities of previous algorithms that consider only changes, insertions, and deletions for the edit distance andk-differences problems.  相似文献   

10.
11.
A discrete string with fixed endpoints carrying a finite number of beads is determined by the masses of beads and the distances between them. The string possesses a set of simple eigenfrequencies corresponding to harmonic eigenmodes. In this paper, the following problem is studied: to find a discrete string carrying seven beads such that its eigenfrequencies coincide with the freqiencies of the notes of the first octave of the musical scale. The problem is solved in two steps. First, the spectral inverse problem is considered, i.e., we recover the string from its spectrum and a set of constants related to the normalized eigenmodes. A procedure of solving this problem is described. One of the main results of the paper is a necessary and sufficient condition for the solvability of the spectral inverse problem. The second step is numerical realization of the procedure. Bibliography: 3 titles.  相似文献   

12.
In this paper, we study the approximate string matching problem under a string distance whose edit operations are translocations of equal length factors. We extend a graph-theoretic approach proposed by Rahman and Illiopoulos (2008) to model our problem. In the sequel, we devise efficient algorithms based on this model to solve a number of variants of the string matching problem with translocations.  相似文献   

13.
The subsequence matching problem is to decide, for given strings S and T, whether S is a subsequence of T. The string S is called the pattern and the string T the text. We consider the case of multiple texts and show how to solve the subsequence matching problem in time linear in the length of the pattern. For this purpose we build an automaton that accepts all subsequences of given texts. This automaton is called the Directed Acyclic Subsequence Graph (DASG). We prove an upper bound for its number of states. Furthermore, we consider a modification of the subsequence matching problem: given a string S and a finite language L, we are to decide whether S is a subsequence of any string in L. We suppose that a finite automaton accepting L is given and present an algorithm for building the DASG for language L. We also mention applications of the DASG to some problems related to subsequences.  相似文献   

14.
In this paper we study the Kolmogorov complexity of initial strings in infinite sequences (being inspired by [9]), and we relate it with the theory of P. Martin-Lof random sequences. Working with partial recursive functions instead of recursive functions we can localize on an apriori given recursive set the points where the complexity is small. The relation between Kolmogorov's complexity and P. Martin-Lof's universal tests enables us to show that the randomness theories for finite strings and infinite sequences are not compatible (e.g.because no universal test is sequential).We lay stress upon the fact that we work within the general framework of a non-necessarily binary alphabet.  相似文献   

15.
The Far From Most Strings Problem (FFMSP) asks for a string that is far from as many as possible of a given set of strings. All the input and the output strings are of the same length, and two strings are far if their Hamming distance is greater than or equal to a given threshold. FFMSP belongs to the class of sequence consensus problems which have applications in molecular biology, amongst others. FFMSP is NP-hard. It does not admit a constant-ratio approximation either, unless P=NP. In the last few years, heuristic and metaheuristic algorithms have been proposed for the problem, which use local search and require a heuristic, also called an evaluation function, to evaluate candidate solutions during local search. The heuristic function used, for this purpose, in these algorithms is the problem’s objective function. However, since many candidate solutions can be of the same objective value, the resulting search landscape includes many points which correspond to local maxima. In this paper, we devise a new heuristic function to evaluate candidate solutions. We then incorporate the proposed heuristic function within a Greedy Randomized Adaptive Search Procedure (GRASP), a metaheuristic originally proposed for the problem by Festa. The resulting algorithm outperforms state-of-the-art with respect to solution quality, in some cases by orders of magnitude, on both random and real data in our experiments. The results indicate that the number of local optima is considerably reduced using the proposed heuristic.  相似文献   

16.
The study and comparison of strings of symbols from a finite or an infinite alphabet is relevant to various areas of science, notably molecular biology, speech recognition, and computer science. In particular, the problem of finding the minimum “distance” between two strings (in general, two blocks of data) is of a practical importance. In this article we investigate the (string) pattern matching problem in a probabilistic framework, namely, it is assumed that both strings form an independent sequences of i.i.d. symbols. Given a text string a of length n and a pattern string b of length m, let Mm,n be the maximum number of matches between b and all m-substrings of a . Our main probabilistic result shows that for a wide range of input parameters in probability (pr.) provided m, n →∞ such that log n = o(m), where P is the probability of a match between any two symbols of these strings, and T is the probability of a match between two positions in the text string and a given position of the pattern string. We also prove that Mm,n/mP almost surely (a.s.) for log n = o(m). © 1993 John Wiley & Sons. Inc.  相似文献   

17.
It is well known that the sets of strings that define all representations of string algebras and many representations of other quotients of path algebras form a regular set, and hence are defined by finite state automata. This short article aims to explain this connection between representation theory and automata theory in elementary terms; no technical background in either representation theory or automata theory is assumed. The article describes the structure of the set of strings of a monomial algebra as a locally testable and hence regular set, and describes explicitly the construction of the automaton, illustrating the construction with an elementary example. Hence it explains how the sets of strings and bands of a monomial algebra correspond to the sets of paths and closed (non-powered) circuits in a finite graph, and how the growth rate of the set of bands is immediately visible from that graph. Presented by C. Ringel.  相似文献   

18.
The string merging problem is to determine a merged string from a given set of strings. The distinguishing property of a solution is that the total cost of editing all of the given strings into this solution is minimal. Necessary and sufficient conditions are presented for the case where this solution matches the solution to the string-to-string correction problem. A special case where deletion is the only allowed edition operation is shown to have the longest common subsequence of the strings as its solution.This research was supported by the U.S. Army Research Office.  相似文献   

19.
Kuske  Dietrich 《Order》1999,16(2):133-148
This paper deals with the automorphism group of the partial order of finite traces. We show that any group can arise as such an automorphism group if we allow arbitrary large dependence alphabets. Restricting to finite dependence alphabets, the automorphism groups are profinite and possess only finitely many simple decomposition factors. Finally, we show that the partial order associated with the Rado graph as dependence alphabet does not give rise to a homogeneous domain thereby answering an open question from Boldi, P., Cardone, F. and Sabadini, N. (1993).  相似文献   

20.
The conditional Kolmogorov complexity of a word a relative to a word b is the minimum length of a program that prints a given b as an input. We generalize this notion to quadruples of strings a, b, c, d: their joint conditional complexity K((ac)∧(bd)) is defined as the minimum length of a program that transforms a into c and transforms b into d. In this paper, we prove that the joint conditional complexity cannot be expressed in terms of the usual conditional (and unconditional) Kolmogorov complexity. This result provides a negative answer to the following question asked by A. Shen on a session of the Kolmogorov seminar at Moscow State University in 1994: Is there a problem of information processing whose complexity is not expressible in terms of the conditional (and unconditional) Kolmogorov complexity? We show that a similar result holds for the classical Shannon entropy. We provide two proofs of both results, an effective one and a “quasi-effective” one. Finally, we present a quasi-effective proof of a strong version of the following statement: there are two strings whose mutual information cannot be extracted. Previously, only a noneffective proof of that statement has been known.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号