首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, we propose a new framework for matrix factorization based on principal component analysis (PCA) where sparsity is imposed. The structure to impose sparsity is defined in terms of groups of correlated variables found in correlation matrices or maps. The framework is based on three new contributions: an algorithm to identify the groups of variables in correlation maps, a visualization for the resulting groups, and a matrix factorization. Together with a method to compute correlation maps with minimum noise level, referred to as missing-data for exploratory data analysis (MEDA), these three contributions constitute a complete matrix factorization framework. Two real examples are used to illustrate the approach and compare it with PCA, sparse PCA, and structured sparse PCA. Supplementary materials for this article are available online.  相似文献   

2.
The aim of this article is to develop a supervised dimension-reduction framework, called spatially weighted principal component analysis (SWPCA), for high-dimensional imaging classification. Two main challenges in imaging classification are the high dimensionality of the feature space and the complex spatial structure of imaging data. In SWPCA, we introduce two sets of novel weights, including global and local spatial weights, which enable a selective treatment of individual features and incorporation of the spatial structure of imaging data and class label information. We develop an efficient two-stage iterative SWPCA algorithm and its penalized version along with the associated weight determination. We use both simulation studies and real data analysis to evaluate the finite-sample performance of our SWPCA. The results show that SWPCA outperforms several competing principal component analysis (PCA) methods, such as supervised PCA (SPCA), and other competing methods, such as sparse discriminant analysis (SDA).  相似文献   

3.
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results.  相似文献   

4.
The article begins with a review of the main approaches for interpretation the results from principal component analysis (PCA) during the last 50–60 years. The simple structure approach is compared to the modern approach of sparse PCA where interpretable solutions are directly obtained. It is shown that their goals are identical but they differ by the way they are realized. Next, the most popular and influential methods for sparse PCA are briefly reviewed. In the remaining part of the paper, a new approach to define sparse PCA is introduced. Several alternative definitions are considered and illustrated on a well-known data set. Finally, it is demonstrated, how one of these possible versions of sparse PCA can be used as a sparse alternative to the classical rotation methods.  相似文献   

5.
Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.  相似文献   

6.
Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.  相似文献   

7.
Sparse PCA by iterative elimination algorithm   总被引:1,自引:0,他引:1  
In this paper we proposed an iterative elimination algorithm for sparse principal component analysis. It recursively eliminates variables according to certain criterion that aims to minimize the loss of explained variance, and reconsiders the sparse principal component analysis problem until the desired sparsity is achieved. Two criteria, the approximated minimal variance loss (AMVL) criterion and the minimal absolute value criterion, are proposed to select the variables eliminated in each iteration. Deflation techniques are discussed for multiple principal components computation. The effectiveness is illustrated by both simulations on synthetic data and applications on real data.  相似文献   

8.
This article compares two approaches in aggregating multiple inputs and multiple outputs in the evaluation of decision making units (DMUs), data envelopment analysis (DEA) and principal component analysis (PCA). DEA, a non-statistical efficiency technique, employs linear programming to weight the inputs/outputs and rank the performance of DMUs. PCA, a multivariate statistical method, combines new multiple measures defined by the inputs/outputs. Both methods are applied to three real world data sets that characterize the economic performance of Chinese cities and yield consistent and mutually complementary results. Nonparametric statistical tests are employed to validate the consistency between the rankings obtained from DEA and PCA.  相似文献   

9.
Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map.  相似文献   

10.
Performance data are usually collected in order to build well‐defined performance indicators. Since such data may conceal additional information, which can be revealed by secondary analysis, we believe that mining of performance data may be fruitful. We also note that performance databases usually contain both qualitative and quantitative variables for which it may be inappropriate to assume some specific (multivariate) underlying distribution. Thus, a suitable technique to deal with these issues should be adopted. In this work, we consider nonlinear principal component analysis (PCA) with optimal scaling, a method developed to incorporate all types of variables, and to discover and handle nonlinear relationships. The reader is offered a case study in which a student opinion database is mined. Though generally gathered to provide evidence of teaching ability, they are exploited here to provide a more general performance evaluation tool for those in charge of managing universities. We show how nonlinear PCA with optimal scaling applied to student opinion data enables users to point out some strengths and weaknesses of educational programs and services within a university. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
This article considers a new type of principal component analysis (PCA) that adaptively reflects the information of data. The ordinary PCA is useful for dimension reduction and identifying important features of multivariate data. However, it uses the second moment of data only, and consequently, it is not efficient for analyzing real observations in the case that these are skewed or asymmetric data. To extend the scope of PCA to non-Gaussian distributed data that cannot be well represented by the second moment, a new approach for PCA is proposed. The core of the methodology is to use a composite asymmetric Huber function defined as a weighted linear combination of modified Huber loss functions, which replaces the conventional square loss function. A practical algorithm to implement the data-adaptive PCA is discussed. Results from numerical studies including simulation study and real data analysis demonstrate the promising empirical properties of the proposed approach. Supplementary materials for this article are available online.  相似文献   

12.
The functional principal components analysis (PCA) involves new considerations on the mechanism of measuring distances (the norm). Some properties arising in functional framework (e.g., smoothing) could be taken into account through an inner product in the data space. But this proposed inner product could make, for example, interpretational or (and) computational abilities worse. The results obtained in this paper establish equivalences between the PCA with the proposed inner product and certain PCA with a given well-suited inner product. These results have been proved in the theoretical framework given by Hilbert valued random variables, in which multivariate and functional PCAs appear jointly as particular cases.  相似文献   

13.
This article presents and compares two approaches of principal component (PC) analysis for two-dimensional functional data on a possibly irregular domain. The first approach applies the singular value decomposition of the data matrix obtained from a fine discretization of the two-dimensional functions. When the functions are only observed at discrete points that are possibly sparse and may differ from function to function, this approach incorporates an initial smoothing step prior to the singular value decomposition. The second approach employs a mixed effects model that specifies the PC functions as bivariate splines on triangulations and the PC scores as random effects. We apply the thin-plate penalty for regularizing the function estimation and develop an effective expectation–maximization algorithm for calculating the penalized likelihood estimates of the parameters. The mixed effects model-based approach integrates scatterplot smoothing and functional PC analysis in a unified framework and is shown in a simulation study to be more efficient than the two-step approach that separately performs smoothing and PC analysis. The proposed methods are applied to analyze the temperature variation in Texas using 100 years of temperature data recorded by Texas weather stations. Supplementary materials for this article are available online.  相似文献   

14.
Principal component analysis (PCA) is often used to visualize data when the rows and the columns are both of interest. In such a setting, there is a lack of inferential methods on the PCA output. We study the asymptotic variance of a fixed-effects model for PCA, and propose several approaches to assessing the variability of PCA estimates: a method based on a parametric bootstrap, a new cell-wise jackknife, as well as a computationally cheaper approximation to the jackknife. We visualize the confidence regions by Procrustes rotation. Using a simulation study, we compare the proposed methods and highlight the strengths and drawbacks of each method as we vary the number of rows, the number of columns, and the strength of the relationships between variables.  相似文献   

15.
An augmented Lagrangian approach for sparse principal component analysis   总被引:1,自引:0,他引:1  
Principal component analysis (PCA) is a widely used technique for data analysis and dimension reduction with numerous applications in science and engineering. However, the standard PCA suffers from the fact that the principal components (PCs) are usually linear combinations of all the original variables, and it is thus often difficult to interpret the PCs. To alleviate this drawback, various sparse PCA approaches were proposed in the literature (Cadima and Jolliffe in J Appl Stat 22:203–214, 1995; d’Aspremont et?al. in J Mach Learn Res 9:1269–1294, 2008; d’Aspremont et?al. SIAM Rev 49:434–448, 2007; Jolliffe in J Appl Stat 22:29–35, 1995; Journée et?al. in J Mach Learn Res 11:517–553, 2010; Jolliffe et?al. in J Comput Graph Stat 12:531–547, 2003; Moghaddam et?al. in Advances in neural information processing systems 18:915–922, MIT Press, Cambridge, 2006; Shen and Huang in J Multivar Anal 99(6):1015–1034, 2008; Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006). Despite success in achieving sparsity, some important properties enjoyed by the standard PCA are lost in these methods such as uncorrelation of PCs and orthogonality of loading vectors. Also, the total explained variance that they attempt to maximize can be too optimistic. In this paper we propose a new formulation for sparse PCA, aiming at finding sparse and nearly uncorrelated PCs with orthogonal loading vectors while explaining as much of the total variance as possible. We also develop a novel augmented Lagrangian method for solving a class of nonsmooth constrained optimization problems, which is well suited for our formulation of sparse PCA. We show that it converges to a feasible point, and moreover under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the augmented Lagrangian subproblems, and establish their global and local convergence. Finally, we compare our sparse PCA approach with several existing methods on synthetic (Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006), Pitprops (Jeffers in Appl Stat 16:225–236, 1967), and gene expression data (Chin et?al in Cancer Cell 10:529C–541C, 2006), respectively. The computational results demonstrate that the sparse PCs produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. Moreover, the experiments on random data show that our method is capable of solving large-scale problems within a reasonable amount of time.  相似文献   

16.
This paper proposes the application of a principal components proportional hazards regression model in condition-based maintenance (CBM) optimization. The Cox proportional hazards model with time-dependent covariates is considered. Principal component analysis (PCA) can be applied to covariates (measurements) to reduce the number of variables included in the model, as well as to eliminate possible collinearity between the covariates. The main issues and problems in using the proposed methodology are discussed. PCA is applied to a simulated CBM data set and two real data sets obtained from industry: oil analysis data and vibration data. Reasonable results are obtained.  相似文献   

17.
We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEPs). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR), and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA, and SDR. Supplementary materials for this article are available online.  相似文献   

18.

Spatio-temporal data are common in practice. Existing methods for analyzing such data often employ parametric modelling with different sets of model assumptions. However, spatio-temporal data in practice often have complicated structures, including complex spatial and temporal data variation, latent spatio-temporal data correlation, and unknown data distribution. Because such data structures reflect the complicated impact of confounding variables, such as weather, demographic variables, life styles, and other cultural and environmental factors, they are usually too complicated to describe by parametric models. In this paper, we suggest a general modelling framework for estimating the mean and covariance functions of spatio-temporal data using a three-step local smoothing procedure. The suggested method can well accommodate the complicated structure of real spatio-temporal data. Under some regularity conditions, the consistency of the proposed estimators is established. Both simulation studies and a real-data application show that our proposed method could work well in practice.

  相似文献   

19.
This article proposes a new approach to principal component analysis (PCA) for interval-valued data. Unlike classical observations, which are represented by single points in p-dimensional space ?p, interval-valued observations are represented by hyper-rectangles in ?p, and as such, have an internal structure that does not exist in classical observations. As a consequence, statistical methods for classical data must be modified to account for the structure of the hyper-rectangles before they can be applied to interval-valued data. This article extends the classical PCA method to interval-valued data by using the so-called symbolic covariance to determine the principal component (PC) space to reflect the total variation of interval-valued data. The article also provides a new approach to constructing the observations in a PC space for better visualization. This new representation of the observations reflects their true structure in the PC space. Supplementary materials for this article are available online.  相似文献   

20.
Singular value decomposition (SVD) is a useful tool in functional data analysis (FDA). Compared to principal component analysis (PCA), SVD is more fundamental, because SVD simultaneously provides the PCAs in both row and column spaces. We compare SVD and PCA from the FDA view point, and extend the usual SVD to variations by considering different centerings. A generalized scree plot is proposed to select an appropriate centering in practice. Several useful matrix views of the SVD components are introduced to explore different features in data, including SVD surface plots, image plots, curve movies, and rotation movies. These methods visualize both column and row information of a two-way matrix simultaneously, relate the matrix to relevant curves, show local variations, and highlight interactions between columns and rows. Several toy examples are designed to compare the different variations of SVD, and real data examples are used to illustrate the usefulness of the visualization methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号