首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 109 毫秒
1.
A cluster-based method for constructing sparse principal components is proposed. The method initially forms clusters of variables, using a new clustering approach called the semi-partition, in two steps. First, the variables are ordered sequentially according to a criterion involving the correlations between variables. Then, the ordered variables are split into two parts based on their generalized variance. The first group of variables becomes an output cluster, while the second one—input for another run of the sequential process. After the optimal clusters have been formed, sparse components are constructed from the singular value decomposition of the data matrices of each cluster. The method is applied to simple data sets with smaller number of variables (p) than observations (n), as well as large gene expression data sets with p ? n. The resulting cluster-based sparse principal components are very promising as evaluated by objective criteria. The method is also compared with other existing approaches and is found to perform well.  相似文献   

2.
鲁棒主成分分析作为统计与数据科学领域的基本工具已被广泛研究,其核心原理是把观测数据分解成低秩部分和稀疏部分.本文基于鲁棒主成分分析的非凸模型,提出了一种新的基于梯度方法和非单调搜索技术的高斯型交替下降方向法.在新算法中,交替更新低秩部分和稀疏部分相关的变量,其中低秩部分的变量是利用一步带有精确步长的梯度下降法进行更新,稀疏部分的变量是采用非单调搜索技术进行更新.本文在一定的条件下建立了新算法的全局收敛理论.最后的数值试验结果表明了新算法的有效性.  相似文献   

3.
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results.  相似文献   

4.
A variable selection method using global score estimation is proposed, which is applicable as a selection criterion in any multivariate method without external variables such as principal component analysis, factor analysis and correspondence analysis. This method selects a subset of variables by which we approximate the original global scores as much as possible in the context of least squares, where the global scores, e.g. principal component scores, factor scores and individual scores, are computed based on the selected variables. Global scores are usually orthogonal. Therefore, the estimated global scores should be restricted to being mutually orthogonal. According to how to satisfy that restriction, we propose three computational steps to estimate the scores. Example data is analyzed to demonstrate the performance and usefulness of the proposed method, in which the proposed algorithm is evaluated and the results obtained using four cost-saving selection procedures are compared. This example shows that combining these steps and procedures yields more accurate results quickly.  相似文献   

5.
Functional principal component analysis is the preliminary step to represent the data in a lower dimensional space and to capture the main modes of variability of the data by means of small number of components which are linear combinations of original variables. Sensitivity of the variance and the covariance functions to irregular observations make this method vulnerable to outliers and may not capture the variation of the regular observations. In this study, we propose a robust functional principal component analysis to find the linear combinations of the original variables that contain most of the information, even if there are outliers and to flag functional outliers. We demonstrate the performance of the proposed method on an extensive simulation study and two datasets from chemometrics and environment.  相似文献   

6.
The problem of recovering a low-rank matrix from a set of observations corrupted with gross sparse error is known as the robust principal component analysis (RPCA) and has many applications in computer vision, image processing and web data ranking. It has been shown that under certain conditions, the solution to the NP-hard RPCA problem can be obtained by solving a convex optimization problem, namely the robust principal component pursuit (RPCP). Moreover, if the observed data matrix has also been corrupted by a dense noise matrix in addition to gross sparse error, then the stable principal component pursuit (SPCP) problem is solved to recover the low-rank matrix. In this paper, we develop efficient algorithms with provable iteration complexity bounds for solving RPCP and SPCP. Numerical results on problems with millions of variables and constraints such as foreground extraction from surveillance video, shadow and specularity removal from face images and video denoising from heavily corrupted data show that our algorithms are competitive to current state-of-the-art solvers for RPCP and SPCP in terms of accuracy and speed.  相似文献   

7.
Principal component analysis (PCA) is an important tool for dimension reduction in multivariate analysis. Regularized PCA methods, such as sparse PCA and functional PCA, have been developed to incorporate special features in many real applications. Sometimes additional variables (referred to as supervision) are measured on the same set of samples, which can potentially drive low-rank structures of the primary data of interest. Classical PCA methods cannot make use of such supervision data. In this article, we propose a supervised sparse and functional principal component (SupSFPC) framework that can incorporate supervision information to recover underlying structures that are more interpretable. The framework unifies and generalizes several existing methods and flexibly adapts to the practical scenarios at hand. The SupSFPC model is formulated in a hierarchical fashion using latent variables. We develop an efficient modified expectation-maximization (EM) algorithm for parameter estimation. We also implement fast data-driven procedures for tuning parameter selection. Our comprehensive simulation and real data examples demonstrate the advantages of SupSFPC. Supplementary materials for this article are available online.  相似文献   

8.
In this article, we propose a new framework for matrix factorization based on principal component analysis (PCA) where sparsity is imposed. The structure to impose sparsity is defined in terms of groups of correlated variables found in correlation matrices or maps. The framework is based on three new contributions: an algorithm to identify the groups of variables in correlation maps, a visualization for the resulting groups, and a matrix factorization. Together with a method to compute correlation maps with minimum noise level, referred to as missing-data for exploratory data analysis (MEDA), these three contributions constitute a complete matrix factorization framework. Two real examples are used to illustrate the approach and compare it with PCA, sparse PCA, and structured sparse PCA. Supplementary materials for this article are available online.  相似文献   

9.
为解决规模以下工业企业调查中存在的样本代表性不足的问题,提出基于平衡样本的校准估计方法,并得出相应的估计量和估计量方差。该方法在抽样设计阶段采用了平衡抽样设计,在估计阶段采用了校准估计方法,较大限度地使用了辅助信息;通过数据分析得出基于平衡样本的校准估计方法要优于基于平衡抽样的HT估计方法。同时,为满足平衡变量间线性无关的假定,提出使用主成分分析、切片逆回归和切片平均方差估计三种方法对相关的平衡变量进行处理的思路。该方法对我国规模以下工业企业调查的完善具有理论与实践的双重意义,可适当的推广至我国政府统计的其他调查中。  相似文献   

10.
Quasi-Monte Carlo (QMC) methods have been playing an important role for high-dimensional problems in computational finance. Several techniques, such as the Brownian bridge (BB) and the principal component analysis, are often used in QMC as possible ways to improve the performance of QMC. This paper proposes a new BB construction, which enjoys some interesting properties that appear useful in QMC methods. The basic idea is to choose the new step of a Brownian path in a certain criterion such that it maximizes the variance explained by the new variable while holding all previously chosen steps fixed. It turns out that using this new construction, the first few variables are more “important” (in the sense of explained variance) than those in the ordinary BB construction, while the cost of the generation is still linear in dimension. We present empirical studies of the proposed algorithm for pricing high-dimensional Asian options and American options, and demonstrate the usefulness of the new BB.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号