首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Given a finite set of strings, the Median String problem consists in finding a string that minimizes the sum of the edit distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in classification, in speech and pattern recognition, and in computational biology. In the latter, Median String is related to the key problem of multiple alignment. In the recent literature, one finds a theorem stating the NP-completeness of the Median String for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the Median String problem is NP-complete for bounded and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related Center String problem. Moreover, we study the parameterized complexity of both problems with respect to the number of input strings. In addition, we provide an algorithm to compute an optimal center under a weighted edit distance in polynomial time when the number of input strings is fixed.  相似文献   

2.
A problem arising from the work of C.A.R. Hoare on parallel programming is that of deciding whether a given string ? is a “merge” of two other given strings σ and τ. We describe a polynomial time algorithm for this problem. This algorithm can easily be extended to check, in polynomial time, whether ? is a merge of any fixed number of strings. The problem for an arbitrary number of strings is shown to be NP-complete and so is unlikely to have a polynomial time algorithm.  相似文献   

3.
We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the worst case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.  相似文献   

4.
We solve two inverse spectral problems for star graphs of Stieltjes strings with Dirichlet and Neumann boundary conditions, respectively, at a selected vertex called root. The root is either the central vertex or, in the more challenging problem, a pendant vertex of the star graph. At all other pendant vertices Dirichlet conditions are imposed; at the central vertex, at which a mass may be placed, continuity and Kirchhoff conditions are assumed. We derive conditions on two sets of real numbers to be the spectra of the above Dirichlet and Neumann problems. Our solution for the inverse problems is constructive: we establish algorithms to recover the mass distribution on the star graph (i.e. the point masses and lengths of subintervals between them) from these two spectra and from the lengths of the separate strings. If the root is a pendant vertex, the two spectra uniquely determine the parameters on the main string (i.e. the string incident to the root) if the length of the main string is known. The mass distribution on the other edges need not be unique; the reason for this is the non-uniqueness caused by the non-strict interlacing of the given data in the case when the root is the central vertex. Finally, we relate of our results to tree-patterned matrix inverse problems.  相似文献   

5.
We consider the problem to reconstruct the mass distribution of a string where the mass is concentrated in a finite number of points, or, equivalently, the problem to reconstruct a simply connected mass spring system with unknown masses and stiffness parameters if the following data are given. Problem 1: The spectra of the string and of a modification of the string, or. Problem 2: The spectra of two different modifications of the string. Here a modification of the string is a string which appears if we link the unknown string with another string of known mass distribution. The paper contains a necessary condition for the existence of a solution of Problem 1, and explicit formulas and an algorithm for the solutions of the Problems 1 and 2 under the condition that there exists a solution. For the case that the mass distribution of the unknown string is not discrete we consider the problem to find discrete approximations of this distribution from the respective spectral data. The methods are based on the spectral theory of generalized second order differential operators as developed by M. G. Krein  相似文献   

6.
We propose a d-dimensional model of the canonical ensemble of open self-avoiding strings. We consider the model of a solitary open string in the d-dimensional Euclidean space ? d, 2 ≤ d < 4, where the string configuration is described by the arc length L and the distance R between string ends. The distribution of the spatial size of the string is determined only by its internal physical state and interaction with the ambient medium. We establish an equation for a transformed probability density W(R,L) of the distance R similar to the known Dyson equation, which is invariant under the continuous group of renormalization transformations; this allows using the renormalization group method to investigate the asymptotic behavior of this density in the case where R→∞ and L→∞. We consider the model of an ensemble of M open strings with the mean string length over the ensemble given by \(\bar L\) , and we use the Darwin-Fowler method to obtain the most probable distribution of strings over their lengths in the limit as M →∞. Averaging the probability density W(R,L) over the canonical ensemble eventually gives the sought density 〈W(R, \(\bar L\) )〉.  相似文献   

7.
Given two strings, the longest common subsequence (LCS) problem consists in computing the length of the longest string that is a subsequence of both input strings. Its generalisation, the all semi-local LCS problem, requires computing the LCS length for each string against all substrings of the other string, and for all prefixes of each string against all suffixes of the other string. We survey a number of algorithmic techniques related to the all semi-local LCS problem. We then present a number of algorithmic applications of these techniques, both existing and new. In particular, we obtain a new all semi-local LCS algorithm, with asymptotic running time matching (in the case of an unbounded alphabet) the fastest known global LCS algorithm by Masek and Paterson. We conclude that semi-local string comparison turns out to be a useful algorithmic plug-in, which unifies, and often improves on, a number of previous approaches to various substring- and subsequence-related problems. The author acknowledges the support of The University of Warwick’s DIMAP (the Centre for Discrete Mathematics and its Applications) during this work.  相似文献   

8.
We consider the problem of tree template matching, a type of tree pattern matching, where the tree templates have some of their leaves denoted as “donʼt care”, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem for unranked ordered trees to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix bar notation, and then propose a table-driven algorithm to solve it. The proposed algorithm is divided into two phases: the preprocessing and the searching phase. The tree template is preprocessed once, and the searching phase can be applied to many subject trees, without the need of preprocessing the tree template again. Although we prove that the space required for preprocessing is exponential in the size of the tree template in the worst case, we show that for a specific class of tree templates, the space required is linear in the size of the tree template. The time for the searching phase is linear in the size of the subject tree in the worst case. Thus, the algorithm is asymptotically optimal when one needs to search for a given tree template, of constant to logarithmic size, in many subject trees.  相似文献   

9.
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. In this paper, we present an algorithm called Nimbus for filtering strings prior to finding repetitions occurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with Nimbus a data set where one wants to find functional elements using a multiple local alignment tool such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.  相似文献   

10.
The subsequence matching problem is to decide, for given strings S and T, whether S is a subsequence of T. The string S is called the pattern and the string T the text. We consider the case of multiple texts and show how to solve the subsequence matching problem in time linear in the length of the pattern. For this purpose we build an automaton that accepts all subsequences of given texts. This automaton is called the Directed Acyclic Subsequence Graph (DASG). We prove an upper bound for its number of states. Furthermore, we consider a modification of the subsequence matching problem: given a string S and a finite language L, we are to decide whether S is a subsequence of any string in L. We suppose that a finite automaton accepting L is given and present an algorithm for building the DASG for language L. We also mention applications of the DASG to some problems related to subsequences.  相似文献   

11.
A cellular string of a polytope is a sequence of faces stacked on top of each other in a given direction. The poset of cellular strings, ordered by refinement, is known to be homotopy equivalent to a sphere. The subposet of coherent cellular strings is the face lattice of the fiber polytope, hence is homeomorphic to a sphere. In some special cases, every cellular string is coherent. Such polytopes are said to be all-coherent. We give a complete classification of zonotopes with the all-coherence property in terms of their oriented matroid structure. Although the face lattice of the fiber polytope in this case is not an oriented matroid invariant, we prove that the all-coherence property is invariant.  相似文献   

12.
For a string A=a1an, a reversalρ(i,j), 1?i?j?n, transforms the string A into a string A=a1ai-1ajaj-1aiaj+1an, that is, the reversal ρ(i,j) reverses the order of symbols in the substring aiaj of A. In the case of signed strings, where each symbol is given a sign + or -, the reversal operation also flips the sign of each symbol in the reversed substring. Given two strings, A and B, signed or unsigned, sorting by reversals (SBR) is the problem of finding the minimum number of reversals that transform the string A into the string B.Traditionally, the problem was studied for permutations, that is, for strings in which every symbol appears exactly once. We consider a generalization of the problem, k-SBR, and allow each symbol to appear at most k times in each string, for some k?1. The main result of the paper is an O(k2)-approximation algorithm running in time O(n). For instances with , this is the best known approximation algorithm for k-SBR and, moreover, it is faster than the previous best approximation algorithm.  相似文献   

13.
In this paper a string is a sequence of positive non-increasing real numbers which sums to one. For our purposes a fractal string is a string formed from the lengths of removed sub-intervals created by a recursive decomposition of the unit interval. By using the so-called complex dimensions of the string, the poles of an associated zeta function, it is possible to obtain detailed information about the behaviour of the asymptotic properties of the string. We consider random versions of fractal strings. We show that by using a random recursive self-similar construction, it is possible to obtain similar results to those for deterministic self-similar strings. In the case of strings generated by the excursions of stable subordinators, we show that the complex dimensions can only lie on the real line. The results allow us to discuss the geometric and spectral asymptotics of one-dimensional domains with random fractal boundary.

  相似文献   


14.
The study and comparison of strings of symbols from a finite or an infinite alphabet is relevant to various areas of science, notably molecular biology, speech recognition, and computer science. In particular, the problem of finding the minimum “distance” between two strings (in general, two blocks of data) is of a practical importance. In this article we investigate the (string) pattern matching problem in a probabilistic framework, namely, it is assumed that both strings form an independent sequences of i.i.d. symbols. Given a text string a of length n and a pattern string b of length m, let Mm,n be the maximum number of matches between b and all m-substrings of a . Our main probabilistic result shows that for a wide range of input parameters in probability (pr.) provided m, n →∞ such that log n = o(m), where P is the probability of a match between any two symbols of these strings, and T is the probability of a match between two positions in the text string and a given position of the pattern string. We also prove that Mm,n/mP almost surely (a.s.) for log n = o(m). © 1993 John Wiley & Sons. Inc.  相似文献   

15.
The consensus string problem for a metric is NP-complete   总被引:1,自引:0,他引:1  
Given a set S of strings, a consensus string of S based on consensus error is a string w that minimizes the sum of the distances between w and all the strings in S. In this paper, we show that the problem of finding a consensus string based on consensus error is NP-complete when the penalty matrix is a metric.  相似文献   

16.
This paper introduces an intuitionistic fuzzy automaton model for computing the similarity between pairs of strings. The model details the possible edit operations needed to transform any input (observed) string into a target (pattern) string by providing a membership and non-membership value between them. In the end, an algorithm is given for approximate string matching and the proposed model computes the similarity and dissimilarity between the pair of strings leading to better approximation.  相似文献   

17.
The Far From Most Strings Problem (FFMSP) asks for a string that is far from as many as possible of a given set of strings. All the input and the output strings are of the same length, and two strings are far if their Hamming distance is greater than or equal to a given threshold. FFMSP belongs to the class of sequence consensus problems which have applications in molecular biology, amongst others. FFMSP is NP-hard. It does not admit a constant-ratio approximation either, unless P=NP. In the last few years, heuristic and metaheuristic algorithms have been proposed for the problem, which use local search and require a heuristic, also called an evaluation function, to evaluate candidate solutions during local search. The heuristic function used, for this purpose, in these algorithms is the problem’s objective function. However, since many candidate solutions can be of the same objective value, the resulting search landscape includes many points which correspond to local maxima. In this paper, we devise a new heuristic function to evaluate candidate solutions. We then incorporate the proposed heuristic function within a Greedy Randomized Adaptive Search Procedure (GRASP), a metaheuristic originally proposed for the problem by Festa. The resulting algorithm outperforms state-of-the-art with respect to solution quality, in some cases by orders of magnitude, on both random and real data in our experiments. The results indicate that the number of local optima is considerably reduced using the proposed heuristic.  相似文献   

18.
The reductive decision procedure for unavoidable strings was recently shown to have an exponential lower bound. Hence, as a special case of generalized pattern matching, the existence of an efficient algorithm deciding string unavoidability remains an interesting open question. It has been hypothesized that some combination of the four necessary conditions implied by the known decidability results would be sufficient. Three of these criteria are determined in polynomial time, and the fourth provides the needed recursion. In this paper, however, we demonstrate the existence of arbitrarily many avoidable strings meeting any extended conjunction of the four necessary conditions. These insufficiency results are achieved by analyzing the appropriate graphical interpretations of the given algorithms. We provide a new combinatorial operation on the corresponding strings and generate arbitrary counterexamples from an empirically located minimal set. Thus, string unavoidability cannot be efficiently decided by the known reductive method or its immediate implications.  相似文献   

19.
In this paper boundary controllability of one-dimensional vibrating system such as the vibrating string or the vibrating beam is studied. In particular we are concerned with the question whether it is possible to transfer a given initial state of vibration into rest within a given time such that the system stays in rest when the control is turned off. This problem is rephrased as a typical trigonometric moment problem which is solved within the framework of an abstract moment problem in a Hilbert space. The results of null-controllability which are obtained are substantially based on classical results of Ingham and Redheffer concerning trigonometric inequalities and incompleteness of certain sequences of trigonometric functions, respectively. The representation of the general statements follows closely the lines of a paper of Russell. Besides a special case is treated where explicit representations of boundary controls can be given that transfer the system to a permanent rest position. This special case includes amplitude boundary control of the vibrating string and the freely supported beam.  相似文献   

20.
Fast detection of string differences is a prerequisite for string clustering problems. An example of such a problem is the identification of duplicate information in the data cleansing stage of the data mining process. The relevant algorithms allow the application of large-scale clustering techniques in order to create clusters of similar strings. The vast majority of comparisons, in such cases, is between very dissimilar strings, therefore methods that perform better at detecting large differences are preferable. This paper presents approaches which comply with this requirement, based on reformulation of the underlying shortest path problem. It is believed that such methods can lead to a family of new algorithms. An upper bound algorithm is presented, as an example, which produces promising results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号