首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the problem of sorting a permutation using a network of data structures as introduced by Knuth and Tarjan. In general the model as considered previously was restricted to networks that are directed acyclic graphs (DAGs) of stacks and/or queues. In this paper we study the question of which are the smallest general graphs that can sort an arbitrary permutation and what is their efficiency. We show that certain two-node graphs can sort in time Θ(n2) and no simpler graph can sort all permutations. We then show that certain three-node graphs sort in time Ω(n3/2), and that there exist graphs of k nodes which can sort in time Θ(nlogkn), which is optimal.  相似文献   

2.
Given a finite point set Z⊂Rd, the covering radius of a nonempty subset XZ is the minimum distance rX,Z such that every point in Z is at a distance of at most rX,Z from some point in X. This paper concerns the construction of a sequence of subsets of decreasing sizes, such that their covering radii are small. To this end, a method for progressive data reduction, referred to as scattered data filtering, is proposed. The resulting scheme is a composition of greedy thinning, a recursive point removal strategy, and exchange, a postprocessing local optimization procedure. The paper proves adaptive a priori lower bounds on the minimal covering radii, which allows us to control for any current subset the deviation of its covering radius from the optimal value at run time. Important computational aspects of greedy thinning and exchange are discussed. The good performance of the proposed filtering scheme is finally shown by numerical examples.  相似文献   

3.
We present the design of more effective and efficient genetic algorithm based data mining techniques that use the concepts of feature selection. Explicit feature selection is traditionally done as a wrapper approach where every candidate feature subset is evaluated by executing the data mining algorithm on that subset. In this article we present a GA for doing both the tasks of mining and feature selection simultaneously by evolving a binary code along side the chromosome structure used for evolving the rules. We then present a wrapper approach to feature selection based on Hausdorff distance measure. Results from applying the above techniques to a real world data mining problem show that combining both the feature selection methods provides the best performance in terms of prediction accuracy and computational efficiency.  相似文献   

4.
Association rule mining from a transaction database (TDB) requires the detection of frequently occurring patterns, called frequent itemsets (FIs), whereby the number of FIs may be potentially huge. Recent approaches for FI mining use the closed itemset paradigm to limit the mining effort to a subset of the entire FI family, the frequent closed itemsets (FCIs). We show here how FCIs can be mined incrementally yet efficiently whenever a new transaction is added to a database whose mining results are available. Our approach for mining FIs in dynamic databases relies on recent results about lattice incremental restructuring and lattice construction. The fundamentals of the incremental FCI mining task are discussed and its reduction to the problem of lattice update, via the CI family, is made explicit. The related structural results underlie two algorithms for updating the set of FCIs of a given TDB upon the insertion of a new transaction. A straightforward method searches for necessary completions throughout the entire CI family, whereas a second method exploits lattice properties to limit the search to CIs which share at least one item with the new transaction. Efficient implementations of the parsimonious method is discussed in the paper together with a set of results from a preliminary study of the method's practical performances.  相似文献   

5.
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operation which hinders the scalability of the approach. This work studies a different parallel formulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature of the centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.  相似文献   

6.
Scalability of clustering algorithms is a critical issue facing the data mining community. One method to handle this issue is to use only a subset of all instances. This paper develops an optimization-based approach to the partitional clustering problem using an algorithm specifically designed for noisy performance, which is a problem that arises when using a subset of instances. Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality. In addition, these results are more persuasive as the size of the problem is larger.  相似文献   

7.
It is shown how to choose the smoothing parameter when a smoothing periodic spline of degree 2m?1 is used to reconstruct a smooth periodic curve from noisy ordinate data. The noise is assumed “white”, and the true curve is assumed to be in the Sobolev spaceW 2 (2m) of periodic functions with absolutely continuousv-th derivative,v=0, 1, ..., 2m?1 and square integrable 2m-th derivative. The criteria is minimum expected square error, averaged over the data points. The dependency of the optimum smoothing parameter on the sample size, the noise variance, and the smoothness of the true curve is found explicitly.  相似文献   

8.
We prove some criteria for uniform hyperbolicity based on the periodic points of the transformation. More precisely, if a mild hyperbolicity condition holds for the periodic points of any diffeomorphism in a residual subset of a C 1-open set U then there exists an open and dense subset A ? U of Axiom A diffeomorphisms. Moreover, we also prove a noninvertible version of Ergodic Closing Lemma which we use to prove a counterpart of this result for local diffeomorphisms.  相似文献   

9.
We propose and discuss a new Lepp-surface method able to produce a small triangular approximation of huge sets of terrain grid data by using a two-goal strategy that assures both small approximation error and well-shaped 3D triangles. This is a refinement method which starts with a coarse initial triangulation of the input data, and incrementally selects and adds data points into the mesh as follows: for the edge e having the highest error in the mesh, one or two points close to (one or two) terminal edges associated with e are inserted in the mesh. The edge error is computed by adding the triangle approximation errors of the two triangles that share e, while each L2-norm triangle error is computed by using a curvature tensor (a good approximation of the surface) at a representative point associated with both triangles. The method produces triangular approximations that capture well the relevant features of the terrain surface by naturally producing well-shaped triangles. We compare our method with a pure L2-norm optimization method.  相似文献   

10.
In an earlier article an ordinal multiple criteria model was presented in which each member of a set of alternatives was given an evaluation on each member of a set K of criteria. In this paper we extend this concept to the situation where each alternative i can be assessed in terms of only a subset Ki of K. Various models are presented for dealing with this partial criteria case and the pros and cons of these models are discussed.  相似文献   

11.
Data mining is generally defined as the science of nontrivial extraction of implicit, previously unknown, and potentially useful information from datasets. There are many websites on the Internet that provide extensive information about products and allow users post comments on various products and rate the product on a scale of 1 to 5. During the past decade, the need for intelligent algorithms for calculating and organizing extremely large sets of data has grown exponentially. In this article we investigate the extent to which a product’s average user rating can be predicted, using a manageable subset of a data set. For this we use a linearization-algorithm based prediction model and sketch how an inverse problem can be formulated to yield a smooth local volatility function of user ratings. The MAPLE programs that implement the proposed algorithm show that the method is reasonably accurate for the reconstruction of volatility of user ratings, which is useful both in accurate user predictions as well as computing sensitivity.  相似文献   

12.
In this article, we address the problem of approximating data points by C 1-smooth polynomial spline curves or surfaces using L 1-norm. The use of this norm helps to preserve the data shape and it reduces extraneous oscillations. In our approach, we introduce a new functional which enables to control directly the distance between the data points and the resulting spline solution. The computational complexity of the minimization algorithm is nonlinear. A local minimization method using sliding windows allows to compute approximation splines within a linear complexity. This strategy seems to be more robust than a global method when applied on large data sets. When the data are noisy, we iteratively apply this method to globally smooth the solution while preserving the data shape. This method is applied to image denoising.  相似文献   

13.
We present a new method, called UTAGMS, for multiple criteria ranking of alternatives from set A using a set of additive value functions which result from an ordinal regression. The preference information provided by the decision maker is a set of pairwise comparisons on a subset of alternatives AR ⊆ A, called reference alternatives. The preference model built via ordinal regression is the set of all additive value functions compatible with the preference information. Using this model, one can define two relations in the set A: the necessary weak preference relation which holds for any two alternatives a, b from set A if and only if for all compatible value functions a is preferred to b, and the possible weak preference relation which holds for this pair if and only if for at least one compatible value function a is preferred to b. These relations establish a necessary and a possible ranking of alternatives from A, being, respectively, a partial preorder and a strongly complete relation. The UTAGMS method is intended to be used interactively, with an increasing subset AR and a progressive statement of pairwise comparisons. When no preference information is provided, the necessary weak preference relation is a weak dominance relation, and the possible weak preference relation is a complete relation. Every new pairwise comparison of reference alternatives, for which the dominance relation does not hold, is enriching the necessary relation and it is impoverishing the possible relation, so that they converge with the growth of the preference information. Distinguishing necessary and possible consequences of preference information on the complete set of actions, UTAGMS answers questions of robustness analysis. Moreover, the method can support the decision maker when his/her preference statements cannot be represented in terms of an additive value function. The method is illustrated by an example solved using the UTAGMS software. Some extensions of the method are also presented.  相似文献   

14.
The original rough set approach proved to be very useful in dealing with inconsistency problems following from information granulation. It operates on a data table composed of a set U of objects (actions) described by a set Q of attributes. Its basic notions are: indiscernibility relation on U, lower and upper approximation of either a subset or a partition of U, dependence and reduction of attributes from Q, and decision rules derived from lower approximations and boundaries of subsets identified with decision classes. The original rough set idea is failing, however, when preference-orders of attribute domains (criteria) are to be taken into account. Precisely, it cannot handle inconsistencies following from violation of the dominance principle. This inconsistency is characteristic for preferential information used in multicriteria decision analysis (MCDA) problems, like sorting, choice or ranking. In order to deal with this kind of inconsistency a number of methodological changes to the original rough sets theory is necessary. The main change is the substitution of the indiscernibility relation by a dominance relation, which permits approximation of ordered sets in multicriteria sorting. To approximate preference relations in multicriteria choice and ranking problems, another change is necessary: substitution of the data table by a pairwise comparison table, where each row corresponds to a pair of objects described by binary relations on particular criteria. In all those MCDA problems, the new rough set approach ends with a set of decision rules playing the role of a comprehensive preference model. It is more general than the classical functional or relational model and it is more understandable for the users because of its natural syntax. In order to workout a recommendation in one of the MCDA problems, we propose exploitation procedures of the set of decision rules. Finally, some other recently obtained results are given: rough approximations by means of similarity relations, rough set handling of missing data, comparison of the rough set model with Sugeno and Choquet integrals, and results on equivalence of a decision rule preference model and a conjoint measurement model which is neither additive nor transitive.  相似文献   

15.
Data mining aims to find patterns in organizational databases. However, most techniques in mining do not consider knowledge of the quality of the database. In this work, we show how to incorporate into classification mining recent advances in the data quality field that view a database as the product of an imprecise manufacturing process where the flaws/defects are captured in quality matrices. We develop a general purpose method of incorporating data quality matrices into the data mining classification task. Our work differs from existing data preparation techniques since while other approaches detect and fix errors to ensure consistency with the entire data set our work makes use of the apriori knowledge of how the data is produced/manufactured.  相似文献   

16.
The space-time monopole equation is the reduction of anti-self-dual Yang-Mills equations in R2,2 to R2,1. This equation is a non-linear wave equation, and can be encoded in a Lax pair. An equivalent Lax pair is used by Dai and Terng to construct monopoles with continuous scattering data, and then the equation can be linearized by the scattering data, allowing one to use the inverse scattering method to solve the Cauchy problem with rapidly decaying small initial data. In this paper, we use the terminology of holomorphic bundle and transversality of certain maps, parametrized by initial data, to give more initial data, with which we can use scattering method to solve the Cauchy problem of the monopole equation up to gauge transformation.  相似文献   

17.
Traditionally, data envelopment analysis models assume total flexibility in weight selection, though this assumption can lead to several variables being ignored in determining the efficiency score. Existing methods constrain weight selection to a predefined range, thus removing possible feasible solutions. As such, in this paper we propose the symmetric weight assignment technique (SWAT) that does not affect feasibility and rewards decision making units (DMUs) that make a symmetric selection of weights. This allows for a method of weight restrictions that does not require preference constraints on the variables. Moreover, we show that the SWAT method may be used to differentiate among efficient DMUs.  相似文献   

18.
Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses ε-SVR models with L1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the ε SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.  相似文献   

19.
We show that the C* -algebra of the regular representation of a discrete group G onto a subset Σ of G is the reduced C* -algebra of an r-discrete groupoid whose space of units is totally disconnected and contains Σ as a dense subset. The C*-algebra of quasicrystals, some Cuntz-Krieger and crossed product algebras, and Wiener-Hopf algebras are particular cases of this construction  相似文献   

20.
In this paper, we introduce a new kind of spectrum for the C⋅0-class contractions. Since elements in this spectrum are functions, rather than numbers, we shall call it functional spectrum. Functional spectrum is a “large” closed subset of the Hardy space over the unit disk, and in many cases there is a canonical embedding of classical spectrum into functional spectrum. The study is carried out in the setting of the Hardy space over the bidisk H2(D2), on which every C⋅0-class contraction has a representation. A key tool is reduction operator. The reduction operator also gives rise to an equivalent statement of the Invariant Subspace Problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号