首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results.  相似文献   

2.
In this paper, an ensemble technique combining the principal component analysis (PCA) with scale-dependent Lyapunov exponent (SDLE) is used to characterize complexity of precipitation dynamical system. The spatial–temporal precipitation data is decomposed by employing PCA method and then the SDLE for the first few principal components (PCs) time series are computed. The first few PCs time series are found to exhibit the different scaling laws on different time scales. The study illustrate that the spatial–temporal precipitation data is chaotic and the precipitation system is truly multiscaled and complex.  相似文献   

3.
An augmented Lagrangian approach for sparse principal component analysis   总被引:1,自引:0,他引:1  
Principal component analysis (PCA) is a widely used technique for data analysis and dimension reduction with numerous applications in science and engineering. However, the standard PCA suffers from the fact that the principal components (PCs) are usually linear combinations of all the original variables, and it is thus often difficult to interpret the PCs. To alleviate this drawback, various sparse PCA approaches were proposed in the literature (Cadima and Jolliffe in J Appl Stat 22:203–214, 1995; d’Aspremont et?al. in J Mach Learn Res 9:1269–1294, 2008; d’Aspremont et?al. SIAM Rev 49:434–448, 2007; Jolliffe in J Appl Stat 22:29–35, 1995; Journée et?al. in J Mach Learn Res 11:517–553, 2010; Jolliffe et?al. in J Comput Graph Stat 12:531–547, 2003; Moghaddam et?al. in Advances in neural information processing systems 18:915–922, MIT Press, Cambridge, 2006; Shen and Huang in J Multivar Anal 99(6):1015–1034, 2008; Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006). Despite success in achieving sparsity, some important properties enjoyed by the standard PCA are lost in these methods such as uncorrelation of PCs and orthogonality of loading vectors. Also, the total explained variance that they attempt to maximize can be too optimistic. In this paper we propose a new formulation for sparse PCA, aiming at finding sparse and nearly uncorrelated PCs with orthogonal loading vectors while explaining as much of the total variance as possible. We also develop a novel augmented Lagrangian method for solving a class of nonsmooth constrained optimization problems, which is well suited for our formulation of sparse PCA. We show that it converges to a feasible point, and moreover under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the augmented Lagrangian subproblems, and establish their global and local convergence. Finally, we compare our sparse PCA approach with several existing methods on synthetic (Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006), Pitprops (Jeffers in Appl Stat 16:225–236, 1967), and gene expression data (Chin et?al in Cancer Cell 10:529C–541C, 2006), respectively. The computational results demonstrate that the sparse PCs produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. Moreover, the experiments on random data show that our method is capable of solving large-scale problems within a reasonable amount of time.  相似文献   

4.
Patient experience and satisfaction surveys have been adopted worldwide to evaluate healthcare quality. Nevertheless, national governments and the general public continue to search for optimal methods to assess healthcare quality from the patient’s perspective. This study proposes a new hybrid method, which combines principal component analysis (PCA) and the evidential reasoning (ER) approach, for assessing patient satisfaction. PCA is utilized to transform correlated items into a few uncorrelated principal components (PCs). Then, the ER approach is employed to aggregate extracted PCs, which are considered as multiple attributes or criteria within the ER framework. To compare the performance of the proposed method with that of another assessment method, analytic hierarchy process (AHP) is employed to acquire the weight of each assessment item in the hierarchical assessment framework, and the ER approach is used to aggregate patient evaluation for each item. Compared with the combined AHP and ER approach, which relies on the respondents’ subjective judgments to calculate criterion and subcriterion weights in the assessment framework, the proposed method is highly objective and completely based on survey data. This study contributes a novel and innovative hybrid method that can help hospital administrators obtain an objective and aggregated healthcare quality assessment based on patient experience.  相似文献   

5.
基于误差理论的区间主成分分析及其应用   总被引:1,自引:0,他引:1  
针对区间数样本,传统的主成分分析需进行拓展。首先讨论了区间样本数据的两种主要来源,即观测误差和符号数据分析。然后将区间数看作一个由中点和半径构成的具有一定误差的数,从误差理论出发,研究基于误差传递公式的区间主成分分析方法,并获得以区间数为表达形式的主成分。最后,结合我国2005年第四季度股票市场的数据进行了实证分析。结果表明,面对海量数据,区间PCA较传统PCA更容易从总体上把握样本的属性。  相似文献   

6.
Principal component analysis (PCA) is often used to visualize data when the rows and the columns are both of interest. In such a setting, there is a lack of inferential methods on the PCA output. We study the asymptotic variance of a fixed-effects model for PCA, and propose several approaches to assessing the variability of PCA estimates: a method based on a parametric bootstrap, a new cell-wise jackknife, as well as a computationally cheaper approximation to the jackknife. We visualize the confidence regions by Procrustes rotation. Using a simulation study, we compare the proposed methods and highlight the strengths and drawbacks of each method as we vary the number of rows, the number of columns, and the strength of the relationships between variables.  相似文献   

7.
This paper considers a previous article published by Zhu in the European Journal of Operational Research which describes a joint use of data envelopment analysis (DEA) and principal component analysis (PCA) in ranking of decision making units (DMUs). In Zhu's empirical study, DEA and PCA yield a consistent ranking. However, this paper finds that in certain instances, DEA and PCA may yield inconsistent rankings. The PCA procedure adopted by Zhu is slightly modified in this article by incorporating other important features of ranking that Zhu has not considered. Numerical results reveal that both approaches show a consistency in ranking with DEA when the data set has a small number of efficient units. But, when a majority of the DMUs in the sample are efficient, only the modified approach produces consistent ranking with DEA.  相似文献   

8.
In the present study, a finite element-least square point interpolation method (FE-LSPIM) is proposed for calculating the band structures of in-plane elastic waves in two-dimensional (2D) and three-dimensional (3D) phononic crystals (PCs). This method utilizes new shape functions by combining mesh-free shape functions and finite element shape functions to exploit the specific advantages of the mesh-free method and finite element method (FEM). As a result, FE-LSPIM inherits the completeness properties of the mesh-free method and the compatibility properties of FEM, and thus the solutions obtained tend to be more accurate. Indeed, according to our previous research, the present method obtains excellent accuracy, especially in the high-frequency domain. The proposed FE-LSPIM was combined with Bloch's theory and applied to compute the band gaps (BGs) for 2D PCs and 3D PCs in the present study, where several PCs were investigated to verify the high accuracy when computing the BGs. Numerical analysis showed that the proposed method can predict the BGs more precisely compared with the FEM and modified FEM.  相似文献   

9.
This research further develops the combined use of principal component analysis (PCA) and data envelopment analysis (DEA). The aim is to reduce the curse of dimensionality that occurs in DEA when there is an excessive number of inputs and outputs in relation to the number of decision-making units. Three separate PCA–DEA formulations are developed in the paper utilising the results of PCA to develop objective, assurance region type constraints on the DEA weights. The first model applies PCA to grouped data representing similar themes, such as quality or environmental measures. The second model, if needed, applies PCA to all inputs and separately to all outputs, thus further strengthening the discrimination power of DEA. The third formulation searches for a single set of global weights with which to fully rank all observations. In summary, it is clear that the use of principal components can noticeably improve the strength of DEA models.  相似文献   

10.
Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online.  相似文献   

11.
主成分分析方法是在经济管理中经常使用的多元统计分析方法,在变量降维方面扮演着很重要的角色,是进行多变量综合评价的有力工具。但传统的主成分分析对于异常值十分敏感,计算结果很容易受到异常值影响,而实际数据常包含异常情况,通常分析很少考虑它们的作用。本文基于MCD估计提出一种稳健的主成分分析方法,模拟和实证分析结果表明,该方法对于抵抗异常值有很好的效果。  相似文献   

12.
Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.  相似文献   

13.
将一种基于特征提取的ε-不灵敏支持向量机方法用于非线性系统辨识.对输入输出数据首先进行核主元特征提取,将特征提取后的数据作为支持向量机的训练数据.将该方法与基于主元特征提取的方法和直接应用ε-不灵敏支持向量机的方法进行含噪和不含噪情况下的仿真比较,结果表明,方法的拟合性能和抗干扰能力优于其他两种方法.  相似文献   

14.
Principal component analysis(PCA)is one of the most popular multivariate data analysis techniques for dimension reduction and data mining,and is widely used in many fields ranging from industry and biology to finance and social development.When working on big data,it is of great necessity to consider the online version of PCA,in which only a small subset of samples could be stored.To handle the online PCA problem,Oja(1982)presented the stochastic power method under the assumption of zero-mean samples,and there have been lots of theoretical analysis and modified versions of this method in recent years.However,a common circumstance where the samples have nonzero mean is seldom studied.In this paper,we derive the convergence rate of a nonzero-mean version of Oja’s algorithm with diminishing stepsizes.In the analysis,we succeed in handling the dependency between each iteration,which is caused by the updated mean term for data centering.Furthermore,we verify the theoretical results by several numerical tests on both artificial and real datasets.Our work offers a way to deal with the top-1 online PCA when the mean of the given data is unknown.  相似文献   

15.
符号数据分析是一种新兴的数据挖掘技术,区间数是最常用的一种符号数据。研究应用区间型符号数据的PCA方法来评价股票的市场综合表现问题。首先介绍了符号数据分析的基本理论。接下来研究了区间数据样本的经验描述统计量的计算,并基于经验相关矩阵,给出了区间主成分分析的算法,该算法最终得到区间数表达形式的主成分取值。最后选取上海证券交易市场20支股票在某一周上的交易数据,进行了实证研究,基于区间主成分得分的矩形图表示,将20支股票按其市场综合表现分成了四类。  相似文献   

16.
US experience shows that deregulation of the airline industry leads to the formation of hub-and-spoke (HS) airline networks. Viewing potential HS networks as decision-making units, we use data envelopment analysis (DEA) to select the most efficient networks configurations from the many that are possible in the deregulated European Union airline market. To overcome the difficulties that DEA encounters when there is an excessive number of inputs or outputs, we employ principal component analysis (PCA) to aggregate certain, clustered data, whilst ensuring very similar results to those achieved under the original DEA model. The DEA–PCA formulation is then illustrated with real-world data gathered from the West European air transportation industry.  相似文献   

17.
This paper proposes the application of a principal components proportional hazards regression model in condition-based maintenance (CBM) optimization. The Cox proportional hazards model with time-dependent covariates is considered. Principal component analysis (PCA) can be applied to covariates (measurements) to reduce the number of variables included in the model, as well as to eliminate possible collinearity between the covariates. The main issues and problems in using the proposed methodology are discussed. PCA is applied to a simulated CBM data set and two real data sets obtained from industry: oil analysis data and vibration data. Reasonable results are obtained.  相似文献   

18.
This article compares two approaches in aggregating multiple inputs and multiple outputs in the evaluation of decision making units (DMUs), data envelopment analysis (DEA) and principal component analysis (PCA). DEA, a non-statistical efficiency technique, employs linear programming to weight the inputs/outputs and rank the performance of DMUs. PCA, a multivariate statistical method, combines new multiple measures defined by the inputs/outputs. Both methods are applied to three real world data sets that characterize the economic performance of Chinese cities and yield consistent and mutually complementary results. Nonparametric statistical tests are employed to validate the consistency between the rankings obtained from DEA and PCA.  相似文献   

19.
This article considers a new type of principal component analysis (PCA) that adaptively reflects the information of data. The ordinary PCA is useful for dimension reduction and identifying important features of multivariate data. However, it uses the second moment of data only, and consequently, it is not efficient for analyzing real observations in the case that these are skewed or asymmetric data. To extend the scope of PCA to non-Gaussian distributed data that cannot be well represented by the second moment, a new approach for PCA is proposed. The core of the methodology is to use a composite asymmetric Huber function defined as a weighted linear combination of modified Huber loss functions, which replaces the conventional square loss function. A practical algorithm to implement the data-adaptive PCA is discussed. Results from numerical studies including simulation study and real data analysis demonstrate the promising empirical properties of the proposed approach. Supplementary materials for this article are available online.  相似文献   

20.
After performing a review of the classical procedures for estimation in the principal component analysis (PCA) of a second order stochastic process, two alternative procedures have been developed to approach such estimates. The first is based on the orthogonal projection method and uses cubic interpolating splines when the data are discrete. The second is based on the trapezoidal method. The accuracy of both procedures is tested by simulating approximated sample-functions of the Brownian motion and the Brownian bridge. The real principal factors of these stochastic processes, which can be evaluated directly, are compared with those estimated by means of the two mentioned algorithms. An application for estimation in the PCA of tourism evolution in Spain from real data is also included.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号