首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A robust principal component analysis for samples from a bivariate distribution function is described. The method is based on robust estimators for dispersion in the univariate case along with a certain linearization of the bivariate structure. Besides the continuity of the functional defining the direction of the suitably modified principal axis, we prove consistency of the corresponding sequence of estimators. Asymptotic normality is established under some additional conditions.  相似文献   

2.
Summary  In the present paper empirical influence functions (EIFs) are derived for eigenvalues and eigenfunctions in functional principal component analysis in both cases where the smoothing parameter is fixed and unfixed. Based on the derived influence functions a sensitivity analysis procedure is proposed for detecting jointly as well as singly influential observations. A numerical example is given to show the usefulness of the proposed procedure. In dealing with the influence on the eigenfunctions two different kinds of influence statistics are introduced. One is based on the EIF for the coefficient vectors of the basis function expansion, and the other is based on the sampled vectors of the functional EIF. Under a certain condition it can be proved both kinds of statistics provide essentially equivalent results.  相似文献   

3.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

4.
Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map.  相似文献   

5.
6.
In this paper, we propose auto-associative (AA) models to generalize Principal component analysis (PCA). AA models have been introduced in data analysis from a geometrical point of view. They are based on the approximation of the observations scatter-plot by a differentiable manifold. In this paper, they are interpreted as Projection pursuit models adapted to the auto-associative case. Their theoretical properties are established and are shown to extend the PCA ones. An iterative algorithm of construction is proposed and its principle is illustrated both on simulated and real data from image analysis.  相似文献   

7.
Computational considerations in functional principal component analysis   总被引:3,自引:0,他引:3  
Computing estimates in functional principal component analysis (FPCA) from discrete data is usually based on the approximation of sample curves in terms of a basis (splines, wavelets, trigonometric functions, etc.) and a geometrical structure in the data space (L 2 spaces, Sobolev spaces, etc.). Until now, the computational efforts have been focused in developing ad hoc algorithms to approximate those estimates by previously selecting an efficient approximating technique and a convenient geometrical structure. The main goal of this paper consists of establishing a procedure to formulate the algorithm for computing estimates of FPCA under general settings. The resulting algorithm is based on the classic multivariate PCA of a certain random vector and can thus be implemented in the majority of statistical packages. In fact, it is derived from the analysis of the effects of modifying the norm in the space of coordinates. Finally, an application on real data will be developed to illustrate the so derived theoretic results. This research has been supported by Project MTM2004-5992 from Dirección General de Investigación, Ministerio de Ciencia y Tecnología.  相似文献   

8.
Classically principal component analysis is one of the most used techniques for exploring the multivariate association pattern of variables. On the other hand, conditioning is one of the most promising ideas for controlling the variability of observed data. Here we present a review of some conditioning methods from the analysis of residuals of a parametric model to the analysis of the local variation defined by means of a non‐oriented graph of individuals, this variation being defined from the deviation from a local mean or alternatively from the differences among contiguous vertices. We will compare these approaches and will show that under some conditions they give comparable results. Finally, we will present an example of application to illustrate the results previously stated. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

9.
We produce approximation bounds on a semidefinite programming relaxation for sparse principal component analysis. The sparse maximum eigenvalue problem cannot be efficiently approximated up to a constant approximation ratio, so our bounds depend on the optimum value of the semidefinite relaxation: the higher this value, the better the approximation. In particular, these bounds allow us to control approximation ratios for tractable statistics in hypothesis testing problems where data points are sampled from Gaussian models with a single sparse leading component.  相似文献   

10.
Dey  Santanu S.  Molinaro  Marco  Wang  Guanyi 《Mathematical Programming》2023,199(1-2):421-459
Mathematical Programming - Sparse principal component analysis with global support (SPCAgs), is the problem of finding the top-r leading principal components such that all these principal...  相似文献   

11.
In this paper, we apply orthogonally equivariant spatial sign covariance matrices as well as their affine equivariant counterparts in principal component analysis. The influence functions and asymptotic covariance matrices of eigenvectors based on robust covariance estimators are derived in order to compare the robustness and efficiency properties. We show in particular that the estimators that use pairwise differences of the observed data have very good efficiency properties, providing practical robust alternatives to classical sample covariance matrix based methods.  相似文献   

12.
We show a branch and bound approach to exactly find the best sparse dimension reduction of a matrix. We can choose between enforcing orthogonality of the coefficients and uncorrelation of the components, and can explicitly set the degree of sparsity. We suggest methods to choose the number of non-zero loadings for each component; illustrate and compare our approach with existing methods through a benchmark data set.  相似文献   

13.
For Principal Component Analysis in Reproducing Kernel Hilbert Spaces (KPCA), optimization over sets containing only linear combinations of all n-tuples of kernel functions is investigated, where n is a positive integer smaller than the number of data. Upper bounds on the accuracy in approximating the optimal solution, achievable without restrictions on the number of kernel functions, are derived. The rates of decrease of the upper bounds for increasing number n of kernel functions are given by the summation of two terms, one proportional to n −1/2 and the other to n −1, and depend on the maximum eigenvalue of the Gram matrix of the kernel with respect to the data. Primal and dual formulations of KPCA are considered. The estimates provide insights into the effectiveness of sparse KPCA techniques, aimed at reducing the computational costs of expansions in terms of kernel units.  相似文献   

14.
Functional principal component analysis is the preliminary step to represent the data in a lower dimensional space and to capture the main modes of variability of the data by means of small number of components which are linear combinations of original variables. Sensitivity of the variance and the covariance functions to irregular observations make this method vulnerable to outliers and may not capture the variation of the regular observations. In this study, we propose a robust functional principal component analysis to find the linear combinations of the original variables that contain most of the information, even if there are outliers and to flag functional outliers. We demonstrate the performance of the proposed method on an extensive simulation study and two datasets from chemometrics and environment.  相似文献   

15.
This paper is concerned with the allocation of multi-attribute records on several disks so as to achieve high degree of concurrency of disk access when responding to partial match queries.An algorithm to distribute a set of multi-attribute records onto different disks is presented. Since our allocation method will use the principal component analysis, this concept is first introduced. We then use it to generate a set of real numbers which are the projections on the first principal component direction and can be viewed as hashing addresses.Then we propose an algorithm based upon these hashing addresses to allocate multi-attribute records onto different disks. Some experimental results show that our method can indeed be used to solve the multi-disk data allocation problem for concurrent accessing.  相似文献   

16.
An augmented Lagrangian approach for sparse principal component analysis   总被引:1,自引:0,他引:1  
Principal component analysis (PCA) is a widely used technique for data analysis and dimension reduction with numerous applications in science and engineering. However, the standard PCA suffers from the fact that the principal components (PCs) are usually linear combinations of all the original variables, and it is thus often difficult to interpret the PCs. To alleviate this drawback, various sparse PCA approaches were proposed in the literature (Cadima and Jolliffe in J Appl Stat 22:203–214, 1995; d’Aspremont et?al. in J Mach Learn Res 9:1269–1294, 2008; d’Aspremont et?al. SIAM Rev 49:434–448, 2007; Jolliffe in J Appl Stat 22:29–35, 1995; Journée et?al. in J Mach Learn Res 11:517–553, 2010; Jolliffe et?al. in J Comput Graph Stat 12:531–547, 2003; Moghaddam et?al. in Advances in neural information processing systems 18:915–922, MIT Press, Cambridge, 2006; Shen and Huang in J Multivar Anal 99(6):1015–1034, 2008; Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006). Despite success in achieving sparsity, some important properties enjoyed by the standard PCA are lost in these methods such as uncorrelation of PCs and orthogonality of loading vectors. Also, the total explained variance that they attempt to maximize can be too optimistic. In this paper we propose a new formulation for sparse PCA, aiming at finding sparse and nearly uncorrelated PCs with orthogonal loading vectors while explaining as much of the total variance as possible. We also develop a novel augmented Lagrangian method for solving a class of nonsmooth constrained optimization problems, which is well suited for our formulation of sparse PCA. We show that it converges to a feasible point, and moreover under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the augmented Lagrangian subproblems, and establish their global and local convergence. Finally, we compare our sparse PCA approach with several existing methods on synthetic (Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006), Pitprops (Jeffers in Appl Stat 16:225–236, 1967), and gene expression data (Chin et?al in Cancer Cell 10:529C–541C, 2006), respectively. The computational results demonstrate that the sparse PCs produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. Moreover, the experiments on random data show that our method is capable of solving large-scale problems within a reasonable amount of time.  相似文献   

17.
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results.  相似文献   

18.
Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.  相似文献   

19.
Suboptimal solutions to kernel principal component analysis are considered. Such solutions take on the form of linear combinations of all n-tuples of kernel functions centered on the data, where n is a positive integer smaller than the cardinality m of the data sample. Their accuracy in approximating the optimal solution, obtained in general for n = m, is estimated. The analysis made in Gnecco and Sanguineti (Comput Optim Appl 42:265–287, 2009) is extended. The estimates derived therein for the approximation of the first principal axis are improved and extensions to the successive principal axes are derived.  相似文献   

20.
Several decompositions of the orthogonal projector PX = X(XX)X′ are proposed with a prospect of their use in constrained principal component analysis (CPCA). In CPCA, the main data matrix X is first decomposed into several additive components by the row side and/or column side predictor variables G and H. The decomposed components are then subjected to singular value decomposition (SVD) to explore structures within the components. Unlike the previous proposal, the current proposal ensures that the decomposed parts are columnwise orthogonal and stay inside the column space of X. Mathematical properties of the decompositions and their data analytic implications are investigated. Extensions to regularized PCA are also envisaged, considering analogous decompositions of ridge operators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号