首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Classification of samples into two or multi-classes is to interest of scientists in almost every field. Traditional statistical methodology for classification does not work well when there are more variables (p) than there are samples (n) and it is highly sensitive to outlying observations. In this study, a robust partial least squares based classification method is proposed to handle data containing outliers where $n\ll p.$ The proposed method is applied to well-known benchmark datasets and its properties are explored by an extensive simulation study.  相似文献   

2.
In this article, the problem of classifying a new observation vector into one of the two known groups Πi,i=1,2, distributed as multivariate normal with common covariance matrix is considered. The total number of observation vectors from the two groups is, however, less than the dimension of the observation vectors. A sample-squared distance between the two groups, using Moore-Penrose inverse, is introduced. A classification rule based on the minimum distance is proposed to classify an observation vector into two or several groups. An expression for the error of misclassification when there are only two groups is derived for large p and n=O(pδ),0<δ<1.  相似文献   

3.
In this article, we present a randomized dynamic cluster algorithm for large data sets. It is based on the restricted random walk cluster algorithm by Schöll and Schöll-Paschinger that has given good results in past studies. We discuss different approaches for the clustering of dynamic data sets. In contrast to most of these methods, dynamic restricted random walk clustering is also efficient for a small percentage of changes in the data set and has the additional advantage that the updates asymptotically produce the same clusters as a reclustering with the static variant; there is thus no need for any reclustering ever. In addition, the method has a relatively low computational complexity which enables it to cluster large data sets.  相似文献   

4.
5.
A branch-and-bound algorithm for the binary knapsack problem is presented which uses a combined stack and deque for storing the tree and the corresponding LP-relaxation. A reduction scheme is used to reduce the problem size. The algorithm was implemented in FORTRAN. Computational experience is based on 600 randomly generated test problems with up to 9000 zero-one variables. The average solution times (excluding an initial sorting step) increase linearly with problem size and compare favorably with other codes designed to solve binary knapsack problems.  相似文献   

6.
Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets.  相似文献   

7.
8.
The chaos theorems show that given almost any alternatives x and y, there exists voting sequence from x to y. However, proofs of such results have been purely existential; that is, there is no algorithm by which such a voting path can be constructed. In this paper, we present such an algorithm for one standard example. Furthermore, it is shown that the algorithm has the property that the voting sequence involves the fewest possible number of steps.  相似文献   

9.
An efficient algorithm for solving inequalities   总被引:1,自引:0,他引:1  
An efficient algorithm for solving a finite system of inequalities in a finite number of iterations is described and analyzed.This work was supported by the UK Science and Engineering Research Council  相似文献   

10.
A simple but efficient algorithm is presented for linear programming. The algorithm computes the projection matrix exactly once throughout the computation unlike that of Karmarkar’s algorithm where in the projection matrix is computed at each and every iteration. The algorithm is best suitable to be implemented on a parallel architecture. Complexity of the algorithm is being studied.  相似文献   

11.
Random forests are a commonly used tool for classification and for ranking candidate predictors based on the so-called variable importance measures. These measures attribute scores to the variables reflecting their importance. A drawback of variable importance measures is that there is no natural cutoff that can be used to discriminate between important and non-important variables. Several approaches, for example approaches based on hypothesis testing, were developed for addressing this problem. The existing testing approaches require the repeated computation of random forests. While for low-dimensional settings those approaches might be computationally tractable, for high-dimensional settings typically including thousands of candidate predictors, computing time is enormous. In this article a computationally fast heuristic variable importance test is proposed that is appropriate for high-dimensional data where many variables do not carry any information. The testing approach is based on a modified version of the permutation variable importance, which is inspired by cross-validation procedures. The new approach is tested and compared to the approach of Altmann and colleagues using simulation studies, which are based on real data from high-dimensional binary classification settings. The new approach controls the type I error and has at least comparable power at a substantially smaller computation time in the studies. Thus, it might be used as a computationally fast alternative to existing procedures for high-dimensional data settings where many variables do not carry any information. The new approach is implemented in the R package vita.  相似文献   

12.
For high dimensional data sets the sample covariance matrix is usually unbiased but noisy if the sample is not large enough. Shrinking the sample covariance towards a constrained, low dimensional estimator can be used to mitigate the sample variability. By doing so, we introduce bias, but reduce variance. In this paper, we give details on feasible optimal shrinkage allowing for time series dependent observations.  相似文献   

13.
14.
Given a simple polygon P with two vertices u and v, the three-guard problem asks whether three guards can move from u to v such that the first and third guards are separately on two boundary chains of P from u to v and the second guard is always kept to be visible from two other guards inside P. It is a generalization of the well-known two-guard problem, in which two guards move on the boundary chains from u to v and are always kept to be mutually visible. In this paper, we introduce the concept of link-2-ray shots, which can be considered as ray shots under the notion of link-2-visibility. Then, we show a one-to-one correspondence between the structure of the restrictions placed on the motion of two guards and the one placed on the motion of three guards, and generalize the solution for the two-guard problem to that for the three-guard problem. We can decide whether there exists a solution for the three-guard problem in O(nlogn) time, and if so generate a walk in O(nlogn+m) time, where n denotes the number of vertices of P and the size of the optimal walk. This improves upon the previous time bounds O(n2) and O(n2logn), respectively. Moreover, our results can be used to solve other more sophisticated geometric problems.  相似文献   

15.
We report on some exceptionally good results in the solution of randomly generated 3-satisfiability instances using the “record-to-record travel (RRT)” local search method. When this simple, but less-studied algorithm is applied to random one-million variable instances from the problem's satisfiable phase, it seems to find satisfying truth assignments almost always in linear time, with the coefficient of linearity depending on the ratio α of clauses to variables in the generated instances. RRT has a parameter for tuning “greediness”. By lessening greediness, the linear time phase can be extended up to very close to the satisfiability threshold αc. Such linear time complexity is typical for random-walk based local search methods for small values of α. Previously, however, it has been suspected that these methods necessarily lose their time linearity far below the satisfiability threshold. The only previously introduced algorithm reported to have nearly linear time complexity also close to the satisfiability threshold is the survey propagation (SP) algorithm. However, SP is not a local search method and is more complicated to implement than RRT. Comparative experiments with the WalkSAT local search algorithm show behavior somewhat similar to RRT, but with the linear time phase not extending quite as close to the satisfiability threshold.  相似文献   

16.
We consider the problem of enumerating triangulations of n points in the plane in general position. We introduce a tree of triangulations and present an algorithm for enumerating triangulations in O(loglogn) time per triangulation. It improves the previous bound by almost linear factor.  相似文献   

17.
We give a new heuristic algorithm for minimum matching problems and apply it to the Euclidean problem with random vertices in 2 dimensions. The algorithm is based on simulated annealing and performs in practice faster than previous heuristic algorithms yielding suboptimal solutions of the same good quality. From configurations with up toN=20.000 vertices in the unit square we estimate that the length of a minimum matching scales asymptotically asLN with (=0.3123±0.0016.
Zusammenfassung Wir stellen einen neuen heuristischen Algorithmus für minimale Matching-Probleme vor und wenden diesen auf das euklidische Problem mit zufÄlliger Punkteverteilung in 2 Dimensionen an. Auf Simulated Annealing basierend lÄuft der Algorithmus schneller als frühere heuristische Algorithmen und erreicht dabei suboptimale Lösungen gleich guter QualitÄt. Aus Konfigurationen mit bis zuN=20.000 Punkten im Einheitsquadrat schÄtzen wir, da\ für die LÄnge des minimalen Matchings asymptotischLN mit=0.3123±0.0016 gilt.
  相似文献   

18.
Maximal margin based frameworks have emerged as a powerful tool for supervised learning. The extension of these ideas to the unsupervised case, however, is problematic since the underlying optimization entails a discrete component. In this paper, we first study the computational complexity of maximal hard margin clustering and show that the hard margin clustering problem can be precisely solved in O(n d+2) time where n is the number of the data points and d is the dimensionality of the input data. However, since it is well known that many datasets commonly ‘express’ themselves primarily in far fewer dimensions, our interest is in evaluating if a careful use of dimensionality reduction can lead to practical and effective algorithms. We build upon these observations and propose a new algorithm that gradually increases the number of features used in the separation model in each iteration, and analyze the convergence properties of this scheme. We report on promising numerical experiments based on a ‘truncated’ version of this approach. Our experiments indicate that for a variety of datasets, good solutions equivalent to those from other existing techniques can be obtained in significantly less time.  相似文献   

19.
We propose using graph theoretic results to develop an infrastructure that tracks movement from a display of one set of variables to another. The illustrative example throughout is the real-time morphing of one scatterplot into another. Hurley and Oldford (J Comput Graph Stat 2010) made extensive use of the graph having variables as nodes and edges indicating a paired relationship between them. The present paper introduces several new graphs derivable from this one whose traversals can be described as particular movements through high dimensional spaces. These are connected to known results in graph theory and the graph theoretic results applied to the problem of visualizing high-dimensional data.  相似文献   

20.
In minimizing interior penalty functions, most of the computational time is spent on the one-dimensional search. This paper presents a method for performing this search on barrier functions which is significantly faster than current techniques. The method exploits the special structure of barrier functions. Comparative computational results are given for a set of six test problems.This research was partially supported by the National Aeronautics and Space Administration under Research Grant NSG 110-61 and by the Office of Naval Research under Grant No. N00014-67-A-0404-0010.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号