首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
A permutation string is a string of symbols over the numerals 1, 2, …, n such that all permutations of the string 1 2 … n are subsequences. The search for short permutation strings arose out of studies into the complexity of shortest path algorithms by Karp and others. The results in the sequel are presented independent of such studies because they are felt to be of intrinsic combinatorial interest [1]. An algorithm for constructing permutation strings of length n2−2n+4 is given.  相似文献   

2.
Two strings parameterize match if there is a bijection defined on the alphabet that transforms the first string character by character into the second string. The problem of finding all parameterized matches of a pattern in a text has been studied in both one and two dimensions but the research has been centered on developing algorithms with good worst-case performance. We present algorithms that solve this problem in sublinear time on average for moderately repetitive patterns.  相似文献   

3.
Clustering is a fundamental problem in many scientific applications. Standard methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work, we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient. This article has supplementary materials available online.  相似文献   

4.
Clustering is an important problem in data mining. It can be formulated as a nonsmooth, nonconvex optimization problem. For the most global optimization techniques this problem is challenging even in medium size data sets. In this paper, we propose an approach that allows one to apply local methods of smooth optimization to solve the clustering problems. We apply an incremental approach to generate starting points for cluster centers which enables us to deal with nonconvexity of the problem. The hyperbolic smoothing technique is applied to handle nonsmoothness of the clustering problems and to make it possible application of smooth optimization algorithms to solve them. Results of numerical experiments with eleven real-world data sets and the comparison with state-of-the-art incremental clustering algorithms demonstrate that the smooth optimization algorithms in combination with the incremental approach are powerful alternative to existing clustering algorithms.  相似文献   

5.
The Far From Most Strings Problem (FFMSP) asks for a string that is far from as many as possible of a given set of strings. All the input and the output strings are of the same length, and two strings are far if their Hamming distance is greater than or equal to a given threshold. FFMSP belongs to the class of sequence consensus problems which have applications in molecular biology, amongst others. FFMSP is NP-hard. It does not admit a constant-ratio approximation either, unless P=NP. In the last few years, heuristic and metaheuristic algorithms have been proposed for the problem, which use local search and require a heuristic, also called an evaluation function, to evaluate candidate solutions during local search. The heuristic function used, for this purpose, in these algorithms is the problem’s objective function. However, since many candidate solutions can be of the same objective value, the resulting search landscape includes many points which correspond to local maxima. In this paper, we devise a new heuristic function to evaluate candidate solutions. We then incorporate the proposed heuristic function within a Greedy Randomized Adaptive Search Procedure (GRASP), a metaheuristic originally proposed for the problem by Festa. The resulting algorithm outperforms state-of-the-art with respect to solution quality, in some cases by orders of magnitude, on both random and real data in our experiments. The results indicate that the number of local optima is considerably reduced using the proposed heuristic.  相似文献   

6.
We find a new solution describing a homogeneous stationary axially symmetric model, which, in contrast to the Gödel model, does not contain closed timelike lines. We find exact solutions corresponding to the motion of a null string in the rotating Universe and show that these solutions crucially depend on the initial data. To obtain more detailed information on the cosmic string dynamics, we performed numerical simulations indicating essential differences in the behavior of strings and null strings in the presence of global rotation of the Universe. These numerical solutions show that the string manifests involved oscillations, varies its shape with the appearance of loops and cusps, and twists into a spiral.  相似文献   

7.
《Journal of Complexity》1999,15(1):72-127
We say that a data structure is builton-lineif, at any instant, we have the data structure corresponding to the input we have seen up to that instant. For instance, consider the suffix tree of a stringx[1, n]. An algorithm building iton-lineis such that, when we have read the firstisymbols ofx[1, n], we have the suffix tree forx[1, i]. We present a new technique, which we refer to asimplicit updates, based on which we obtain: (a) an algorithm for theon-lineconstruction of the Lsuffix tree of ann×nmatrixA—this data structure is the two-dimensional analog of the suffix tree of a string; (b) simple algorithms implementing primitive operations forLZ1-typeon-line losslessimage compression methods. Those methods, recently introduced by Storer, are generalizations ofLZ1-typecompression methods for strings. For the problem in (a), we get nearly an order of magnitude improvement over algorithms that can be derived from known techniques. For the problem in (b), we do not get an asymptotic speed-up with respect to what can be done with known techniques; rather we show that our algorithms are a natural support for the primitive operations. This may lead to faster implementations of those primitive operations. To the best of our knowledge, our technique is the first one that effectively addresses problems related to theon-lineconstruction of two-dimensional suffix trees.  相似文献   

8.
We solve two inverse spectral problems for star graphs of Stieltjes strings with Dirichlet and Neumann boundary conditions, respectively, at a selected vertex called root. The root is either the central vertex or, in the more challenging problem, a pendant vertex of the star graph. At all other pendant vertices Dirichlet conditions are imposed; at the central vertex, at which a mass may be placed, continuity and Kirchhoff conditions are assumed. We derive conditions on two sets of real numbers to be the spectra of the above Dirichlet and Neumann problems. Our solution for the inverse problems is constructive: we establish algorithms to recover the mass distribution on the star graph (i.e. the point masses and lengths of subintervals between them) from these two spectra and from the lengths of the separate strings. If the root is a pendant vertex, the two spectra uniquely determine the parameters on the main string (i.e. the string incident to the root) if the length of the main string is known. The mass distribution on the other edges need not be unique; the reason for this is the non-uniqueness caused by the non-strict interlacing of the given data in the case when the root is the central vertex. Finally, we relate of our results to tree-patterned matrix inverse problems.  相似文献   

9.
The string merging problem is to determine a merged string from a given set of strings. The distinguishing property of a solution is that the total cost of editing all of the given strings into this solution is minimal. Necessary and sufficient conditions are presented for the case where this solution matches the solution to the string-to-string correction problem. A special case where deletion is the only allowed edition operation is shown to have the longest common subsequence of the strings as its solution.This research was supported by the U.S. Army Research Office.  相似文献   

10.
Given a finite set of strings, the Median String problem consists in finding a string that minimizes the sum of the edit distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in classification, in speech and pattern recognition, and in computational biology. In the latter, Median String is related to the key problem of multiple alignment. In the recent literature, one finds a theorem stating the NP-completeness of the Median String for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the Median String problem is NP-complete for bounded and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related Center String problem. Moreover, we study the parameterized complexity of both problems with respect to the number of input strings. In addition, we provide an algorithm to compute an optimal center under a weighted edit distance in polynomial time when the number of input strings is fixed.  相似文献   

11.
Given two strings, the longest common subsequence (LCS) problem consists in computing the length of the longest string that is a subsequence of both input strings. Its generalisation, the all semi-local LCS problem, requires computing the LCS length for each string against all substrings of the other string, and for all prefixes of each string against all suffixes of the other string. We survey a number of algorithmic techniques related to the all semi-local LCS problem. We then present a number of algorithmic applications of these techniques, both existing and new. In particular, we obtain a new all semi-local LCS algorithm, with asymptotic running time matching (in the case of an unbounded alphabet) the fastest known global LCS algorithm by Masek and Paterson. We conclude that semi-local string comparison turns out to be a useful algorithmic plug-in, which unifies, and often improves on, a number of previous approaches to various substring- and subsequence-related problems. The author acknowledges the support of The University of Warwick’s DIMAP (the Centre for Discrete Mathematics and its Applications) during this work.  相似文献   

12.
In this paper we discuss the evolution of physical concepts which led to the generation and development of string theories. The paper is conceived with the intention of summarizing and extending with new aspects the specific characteristics of strings which refer to the physical intuition and experiment. We hope to present new insights into the physics of strings and make it understandable from the point of view of a non-string theorist. Even if there exist some opinions that the (super)string theory appertains to the twenty-first or twenty-second century or that there are no concrete new predictions of string theory at low energies, we believe that string theory presents a rich field of research and a source of physical intuition not only for mathematicians but also for theoretical and experimental physicists. We offer as an example an atomic electron cloud which can also be interpreted in terms of a fixed point in a string theory We propose also an experiment to verify the fundamental hypotheses. Finally we deduce that the number of dimensions of spacetime must be infinite by virtue of the axiom of universality of motion.  相似文献   

13.
A bootstrap-based aggregate classifier for model-based clustering   总被引:1,自引:0,他引:1  
In model-based clustering, a situation in which true class labels are unknown and that is therefore also referred to as unsupervised learning, observations are typically classified by the Bayes modal rule. In this study, we assess whether alternative classifiers from the classification or supervised-learning literature—developed for situations in which class labels are known—can improve the Bayes rule. More specifically, we investigate the performance of bootstrap-based aggregate (bagging) rules after adapting these to the model-based clustering context. It is argued that specific issues, such as the label-switching problem, have to be carefully addressed when using bootstrap methods in model-based clustering. Our two Monte Carlo studies show that classification based on the Bayes rule is rather stable and difficult to improve by bootstrap-based aggregate rules, even for sparse data. An empirical example illustrates the various approaches described in this paper.  相似文献   

14.
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. In this paper, we present an algorithm called Nimbus for filtering strings prior to finding repetitions occurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with Nimbus a data set where one wants to find functional elements using a multiple local alignment tool such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.  相似文献   

15.
Clustering is the process of grouping a set of objects into classes of similar objects. In the past, clustering algorithms had a common problem that they use only one set of attributes for both partitioning the data space and measuring the similarity between objects. This problem has limited the use of the existing algorithms on some practical situation. Hence, this paper introduces a new clustering algorithm, which partitions data space by constructing a decision tree using one attribute set, and measures the degree of similarity using another. Three different partitioning methods are presented. The algorithm is explained with illustration. The performance and accuracy of the four partitioning methods are evaluated and compared.  相似文献   

16.
The consensus string problem for a metric is NP-complete   总被引:1,自引:0,他引:1  
Given a set S of strings, a consensus string of S based on consensus error is a string w that minimizes the sum of the distances between w and all the strings in S. In this paper, we show that the problem of finding a consensus string based on consensus error is NP-complete when the penalty matrix is a metric.  相似文献   

17.
A problem arising from the work of C.A.R. Hoare on parallel programming is that of deciding whether a given string ? is a “merge” of two other given strings σ and τ. We describe a polynomial time algorithm for this problem. This algorithm can easily be extended to check, in polynomial time, whether ? is a merge of any fixed number of strings. The problem for an arbitrary number of strings is shown to be NP-complete and so is unlikely to have a polynomial time algorithm.  相似文献   

18.
Clustering is one of the most widely used approaches in data mining with real life applications in virtually any domain. The huge interest in clustering has led to a possibly three-digit number of algorithms with the k-means family probably the most widely used group of methods. Besides classic bivalent approaches, clustering algorithms belonging to the domain of soft computing have been proposed and successfully applied in the past four decades. Bezdek’s fuzzy c-means is a prominent example for such soft computing cluster algorithms with many effective real life applications. More recently, Lingras and West enriched this area by introducing rough k-means. In this article we compare k-means to fuzzy c-means and rough k-means as important representatives of soft clustering. On the basis of this comparison, we then survey important extensions and derivatives of these algorithms; our particular interest here is on hybrid clustering, merging fuzzy and rough concepts. We also give some examples where k-means, rough k-means, and fuzzy c-means have been used in studies.  相似文献   

19.
All shipping liner companies divide their service regions into several rotations (strings) in order to operate their container vessels. A string is the ordered set of ports at which a container vessel will call. Each port is usually called at no more than twice along one string, although a single port may be called at several times on different strings. The size of string dictates the number of vessels required to offer a given frequency of service. In order to better use their shipping capacity, groups of Liner Service Providers sometimes make a short term agreement to merge some of their service routes (in a certain region) into one main ocean going rotation and p feeder rotations. In order to minimize the weighted sum of transit time, and fixed deployment costs, this paper proposes a mixed integer linear programming model of the network design, and an allocation of proper capacity size and frequency setting for every rotation. Given that none of the existing general-purpose MIP solvers is able to solve even very small problem instances in a reasonable time, we propose a Lagrangian decomposition approach which uses a heuristic procedure and is capable of obtaining practical and high quality solutions in reasonable times. The model will be applied on a real example, and we shall present some of the results obtained by our model which show how it facilitates a better use of assets and a significant reduction in the use of fuel, therefore allowing a more environmentally friendly service.  相似文献   

20.
The problem of file organization which we consider involves altering the placement of records on pages of a secondary storage device. In addition, we want this reorganization to be done in-place, i.e., using the file's original storage space for the newly reorganized file. The motivation for such a physical change is to improve the database system's performance. For example, by placing frequently and jointly accessed records on the same page or pages, we can try to minimize the number of page accesses made in answering a set of queeries. The optimal assignment (or reassignment) of records to clusters is exactly what record clustering algorithms attempt to do. However, record clustering algorithms usually do not solve the entire problem, i.e., they do not specify how to efficiently reorganize the file to reflect the clustering assignment which they determine. Our algorithm is a companion to general record clustering algorithms since it actually transforms the file. The problem of optimal file reorganization isNP-hard. Consequently, our reorganization algorithm is based on heuristics. The algorithm's time and space requirements are reasonable and its solution is near optimal. In addition, the reorganization problem which we consider in this paper is similar to the problem of join processing when indexes are used.The research of this author was partially supported by the National Science Foundation under grant IST-8696157.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号