期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Partial least squares classification for high dimensional data using the PCOUT algorithm

Asuman Turkmen Nedret Billor 《Computational Statistics》2013,28(2):771-788

Classification of samples into two or multi-classes is to interest of scientists in almost every field. Traditional statistical methodology for classification does not work well when there are more variables (p) than there are samples (n) and it is highly sensitive to outlying observations. In this study, a robust partial least squares based classification method is proposed to handle data containing outliers where $n\ll p.$ The proposed method is applied to well-known benchmark datasets and its properties are explored by an extensive simulation study. 相似文献

2.

Minimum distance classification rules for high dimensional data

Muni S. Srivastava 《Journal of multivariate analysis》2006,97(9):2057-2070

In this article, the problem of classifying a new observation vector into one of the two known groups Π_i,i=1,2, distributed as multivariate normal with common covariance matrix is considered. The total number of observation vectors from the two groups is, however, less than the dimension of the observation vectors. A sample-squared distance between the two groups, using Moore-Penrose inverse, is introduced. A classification rule based on the minimum distance is proposed to classify an observation vector into two or several groups. An expression for the error of misclassification when there are only two groups is derived for large p and n=O(p^δ),0<δ<1. 相似文献

3.

An update algorithm for restricted random walk clustering for dynamic data sets

Markus Franke Andreas Geyer-Schulz 《Advances in Data Analysis and Classification》2009,3(1):63-92

In this article, we present a randomized dynamic cluster algorithm for large data sets. It is based on the restricted random walk cluster algorithm by Schöll and Schöll-Paschinger that has given good results in past studies. We discuss different approaches for the clustering of dynamic data sets. In contrast to most of these methods, dynamic restricted random walk clustering is also efficient for a small percentage of changes in the data set and has the additional advantage that the updates asymptotically produce the same clusters as a reclustering with the static variant; there is thus no need for any reclustering ever. In addition, the method has a relatively low computational complexity which enables it to cluster large data sets. 相似文献

4.

An efficient decentralized clustering algorithm for aggregation of noisy multi-mean data 总被引：1，自引：0，他引：1

Séamus Ó Buadhacháin Gregory Provan 《Journal of Heuristics》2015,21(2):301-328

相似文献

5.

An algorithm and efficient data structures for the binary knapsack problem

Uwe Suhl 《European Journal of Operational Research》1978,2(6):420-428

A branch-and-bound algorithm for the binary knapsack problem is presented which uses a combined stack and deque for storing the tree and the corresponding LP-relaxation. A reduction scheme is used to reduce the problem size. The algorithm was implemented in FORTRAN. Computational experience is based on 600 randomly generated test problems with up to 9000 zero-one variables. The average solution times (excluding an initial sorting step) increase linearly with problem size and compare favorably with other codes designed to solve binary knapsack problems. 相似文献

6.

Decision rules for efficient classification of biological data

Mario R. Guarracino Altannar Chinchuluun Panos M. Pardalos 《Optimization Letters》2009,3(3):357-366

Cancer classification using genomic data is one of the major research areas in the medical field. Therefore, a number of binary classification methods have been proposed in recent years. Top Scoring Pair (TSP) method is one of the most promising techniques that classify genomic data in a lower dimensional subspace using a simple decision rule. In the present paper, we propose a supervised classification technique that utilizes incremental generalized eigenvalue and top scoring pair classifiers to obtain higher classification accuracy with a small training set. We validate our method by applying it to well known microarray data sets. 相似文献

7.

Combined probabilistic algorithm for solving high dimensional problems

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(1):30-47

相似文献

8.

An efficient algorithm for voting sequences

《Mathematical Social Sciences》1987,14(1):59-75

The chaos theorems show that given almost any alternatives x and y, there exists voting sequence from x to y. However, proofs of such results have been purely existential; that is, there is no algorithm by which such a voting path can be constructed. In this paper, we present such an algorithm for one standard example. Furthermore, it is shown that the algorithm has the property that the voting sequence involves the fewest possible number of steps. 相似文献

9.

An efficient algorithm for solving inequalities 总被引：1，自引：0，他引：1

D. Q. Mayne M. Sahba 《Journal of Optimization Theory and Applications》1985,45(3):407-423

An efficient algorithm for solving a finite system of inequalities in a finite number of iterations is described and analyzed.This work was supported by the UK Science and Engineering Research Council 相似文献

10.

An efficient algorithm for linear programming

V. Ch. Venkaiah 《Proceedings Mathematical Sciences》1990,100(3):295-301

A simple but efficient algorithm is presented for linear programming. The algorithm computes the projection matrix exactly once throughout the computation unlike that of Karmarkar’s algorithm where in the projection matrix is computed at each and every iteration. The algorithm is best suitable to be implemented on a parallel architecture. Complexity of the algorithm is being studied. 相似文献

11.

A computationally fast variable importance test for random forests for high-dimensional data

Silke Janitza Ender Celik Anne-Laure Boulesteix 《Advances in Data Analysis and Classification》2018,12(4):885-915

Random forests are a commonly used tool for classification and for ranking candidate predictors based on the so-called variable importance measures. These measures attribute scores to the variables reflecting their importance. A drawback of variable importance measures is that there is no natural cutoff that can be used to discriminate between important and non-important variables. Several approaches, for example approaches based on hypothesis testing, were developed for addressing this problem. The existing testing approaches require the repeated computation of random forests. While for low-dimensional settings those approaches might be computationally tractable, for high-dimensional settings typically including thousands of candidate predictors, computing time is enormous. In this article a computationally fast heuristic variable importance test is proposed that is appropriate for high-dimensional data where many variables do not carry any information. The testing approach is based on a modified version of the permutation variable importance, which is inspired by cross-validation procedures. The new approach is tested and compared to the approach of Altmann and colleagues using simulation studies, which are based on real data from high-dimensional binary classification settings. The new approach controls the type I error and has at least comparable power at a substantially smaller computation time in the studies. Thus, it might be used as a computationally fast alternative to existing procedures for high-dimensional data settings where many variables do not carry any information. The new approach is implemented in the R package vita. 相似文献

12.

Sample covariance shrinkage for high dimensional dependent data

Alessio Sancetta 《Journal of multivariate analysis》2008,99(5):949-967

For high dimensional data sets the sample covariance matrix is usually unbiased but noisy if the sample is not large enough. Shrinking the sample covariance towards a constrained, low dimensional estimator can be used to mitigate the sample variability. By doing so, we introduce bias, but reduce variance. In this paper, we give details on feasible optimal shrinkage allowing for time series dependent observations. 相似文献

13.

Threshold for detecting high dimensional geometry in anisotropic random geometric graphs

Matthew Brennan Guy Bresler Brice Huang 《Random Structures and Algorithms》2024,64(1):125-137

相似文献

14.

An efficient algorithm for the three-guard problem

Xuehou Tan 《Discrete Applied Mathematics》2008,156(17):3312-3324

Given a simple polygon P with two vertices u and v, the three-guard problem asks whether three guards can move from u to v such that the first and third guards are separately on two boundary chains of P from u to v and the second guard is always kept to be visible from two other guards inside P. It is a generalization of the well-known two-guard problem, in which two guards move on the boundary chains from u to v and are always kept to be mutually visible. In this paper, we introduce the concept of link-2-ray shots, which can be considered as ray shots under the notion of link-2-visibility. Then, we show a one-to-one correspondence between the structure of the restrictions placed on the motion of two guards and the one placed on the motion of three guards, and generalize the solution for the two-guard problem to that for the three-guard problem. We can decide whether there exists a solution for the three-guard problem in O(nlogn) time, and if so generate a walk in O(nlogn+m) time, where n denotes the number of vertices of P and the size of the optimal walk. This improves upon the previous time bounds O(n²) and O(n²logn), respectively. Moreover, our results can be used to solve other more sophisticated geometric problems. 相似文献

15.

An efficient local search method for random 3-satisfiability

《Electronic Notes in Discrete Mathematics》2003

We report on some exceptionally good results in the solution of randomly generated 3-satisfiability instances using the “record-to-record travel (RRT)” local search method. When this simple, but less-studied algorithm is applied to random one-million variable instances from the problem's satisfiable phase, it seems to find satisfying truth assignments almost always in linear time, with the coefficient of linearity depending on the ratio α of clauses to variables in the generated instances. RRT has a parameter for tuning “greediness”. By lessening greediness, the linear time phase can be extended up to very close to the satisfiability threshold α_c. Such linear time complexity is typical for random-walk based local search methods for small values of α. Previously, however, it has been suspected that these methods necessarily lose their time linearity far below the satisfiability threshold. The only previously introduced algorithm reported to have nearly linear time complexity also close to the satisfiability threshold is the survey propagation (SP) algorithm. However, SP is not a local search method and is more complicated to implement than RRT. Comparative experiments with the WalkSAT local search algorithm show behavior somewhat similar to RRT, but with the linear time phase not extending quite as close to the satisfiability threshold. 相似文献

16.

An efficient algorithm for enumeration of triangulations

Sergei Bespamyatnikh 《Computational Geometry》2002,23(3):36-279

We consider the problem of enumerating triangulations of n points in the plane in general position. We introduce a tree of triangulations and present an algorithm for enumerating triangulations in O(loglogn) time per triangulation. It improves the previous bound by almost linear factor. 相似文献

17.

An efficient heuristic algorithm for minimum matching

P. Grassberger H. Freund 《Mathematical Methods of Operations Research》1990,34(4):239-253

We give a new heuristic algorithm for minimum matching problems and apply it to the Euclidean problem with random vertices in 2 dimensions. The algorithm is based on simulated annealing and performs in practice faster than previous heuristic algorithms yielding suboptimal solutions of the same good quality. From configurations with up toN=20.000 vertices in the unit square we estimate that the length of a minimum matching scales asymptotically asLN with (=0.3123±0.0016.

Zusammenfassung Wir stellen einen neuen heuristischen Algorithmus für minimale Matching-Probleme vor und wenden diesen auf das euklidische Problem mit zufÄlliger Punkteverteilung in 2 Dimensionen an. Auf Simulated Annealing basierend lÄuft der Algorithmus schneller als frühere heuristische Algorithmen und erreicht dabei suboptimale Lösungen gleich guter QualitÄt. Aus Konfigurationen mit bis zuN=20.000 Punkten im Einheitsquadrat schÄtzen wir, da\ für die LÄnge des minimalen Matchings asymptotischLN mit=0.3123±0.0016 gilt.

相似文献

18.

An efficient algorithm for maximal margin clustering

Jiming Peng Lopamudra Mukherjee Vikas Singh Dale Schuurmans Linli Xu 《Journal of Global Optimization》2012,52(1):123-137

Maximal margin based frameworks have emerged as a powerful tool for supervised learning. The extension of these ideas to the unsupervised case, however, is problematic since the underlying optimization entails a discrete component. In this paper, we first study the computational complexity of maximal hard margin clustering and show that the hard margin clustering problem can be precisely solved in O(n ^d+2) time where n is the number of the data points and d is the dimensionality of the input data. However, since it is well known that many datasets commonly ‘express’ themselves primarily in far fewer dimensions, our interest is in evaluating if a careful use of dimensionality reduction can lead to practical and effective algorithms. We build upon these observations and propose a new algorithm that gradually increases the number of features used in the separation model in each iteration, and analyze the convergence properties of this scheme. We report on promising numerical experiments based on a ‘truncated’ version of this approach. Our experiments indicate that for a variety of datasets, good solutions equivalent to those from other existing techniques can be obtained in significantly less time. 相似文献

19.

Graphs as navigational infrastructure for high dimensional data spaces

C. B. Hurley R. W. Oldford 《Computational Statistics》2011,26(4):585-612

We propose using graph theoretic results to develop an infrastructure that tracks movement from a display of one set of variables to another. The illustrative example throughout is the real-time morphing of one scatterplot into another. Hurley and Oldford (J Comput Graph Stat 2010) made extensive use of the graph having variables as nodes and edges indicating a paired relationship between them. The present paper introduces several new graphs derivable from this one whose traversals can be described as particular movements through high dimensional spaces. These are connected to known results in graph theory and the graph theoretic results applied to the problem of visualizing high-dimensional data. 相似文献

20.

An efficient one—dimensional search procedure for barrier functions

L. S. Lasdon R. L. Fox M. W. Ratner 《Mathematical Programming》1973,4(1):279-296

In minimizing interior penalty functions, most of the computational time is spent on the one-dimensional search. This paper presents a method for performing this search on barrier functions which is significantly faster than current techniques. The method exploits the special structure of barrier functions. Comparative computational results are given for a set of six test problems.This research was partially supported by the National Aeronautics and Space Administration under Research Grant NSG 110-61 and by the Office of Naval Research under Grant No. N00014-67-A-0404-0010. 相似文献