首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Principal component analysis (PCA) is an important tool for dimension reduction in multivariate analysis. Regularized PCA methods, such as sparse PCA and functional PCA, have been developed to incorporate special features in many real applications. Sometimes additional variables (referred to as supervision) are measured on the same set of samples, which can potentially drive low-rank structures of the primary data of interest. Classical PCA methods cannot make use of such supervision data. In this article, we propose a supervised sparse and functional principal component (SupSFPC) framework that can incorporate supervision information to recover underlying structures that are more interpretable. The framework unifies and generalizes several existing methods and flexibly adapts to the practical scenarios at hand. The SupSFPC model is formulated in a hierarchical fashion using latent variables. We develop an efficient modified expectation-maximization (EM) algorithm for parameter estimation. We also implement fast data-driven procedures for tuning parameter selection. Our comprehensive simulation and real data examples demonstrate the advantages of SupSFPC. Supplementary materials for this article are available online.  相似文献   

2.
This research further develops the combined use of principal component analysis (PCA) and data envelopment analysis (DEA). The aim is to reduce the curse of dimensionality that occurs in DEA when there is an excessive number of inputs and outputs in relation to the number of decision-making units. Three separate PCA–DEA formulations are developed in the paper utilising the results of PCA to develop objective, assurance region type constraints on the DEA weights. The first model applies PCA to grouped data representing similar themes, such as quality or environmental measures. The second model, if needed, applies PCA to all inputs and separately to all outputs, thus further strengthening the discrimination power of DEA. The third formulation searches for a single set of global weights with which to fully rank all observations. In summary, it is clear that the use of principal components can noticeably improve the strength of DEA models.  相似文献   

3.
We present the basic idea of abstract principal component analysis(APCA)as a general approach that extends various popular data analysis techniques such as PCA and GPCA.We describe the mathematical theory behind APCA and focus on a particular application to mode extractions from a data set of mixed temporal and spatial signals.For illustration,algorithmic implementation details and numerical examples are presented for the extraction of a number of basic types of wave modes including,in particular,dynamic modes involving spatial shifts.  相似文献   

4.
Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map.  相似文献   

5.
This article proposes a new approach to principal component analysis (PCA) for interval-valued data. Unlike classical observations, which are represented by single points in p-dimensional space ?p, interval-valued observations are represented by hyper-rectangles in ?p, and as such, have an internal structure that does not exist in classical observations. As a consequence, statistical methods for classical data must be modified to account for the structure of the hyper-rectangles before they can be applied to interval-valued data. This article extends the classical PCA method to interval-valued data by using the so-called symbolic covariance to determine the principal component (PC) space to reflect the total variation of interval-valued data. The article also provides a new approach to constructing the observations in a PC space for better visualization. This new representation of the observations reflects their true structure in the PC space. Supplementary materials for this article are available online.  相似文献   

6.
Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.  相似文献   

7.
Principal component analysis (PCA) has been a prominent tool for high-dimensional data analysis. Online algorithms that estimate the principal component by processing streaming data are of tremendous practical and theoretical interests. Despite its rich applications, theoretical convergence analysis remains largely open. In this paper, we cast online PCA into a stochastic nonconvex optimization problem, and we analyze the online PCA algorithm as a stochastic approximation iteration. The stochastic approximation iteration processes data points incrementally and maintains a running estimate of the principal component. We prove for the first time a nearly optimal finite-sample error bound for the online PCA algorithm. Under the subgaussian assumption, we show that the finite-sample error bound closely matches the minimax information lower bound.  相似文献   

8.
The article begins with a review of the main approaches for interpretation the results from principal component analysis (PCA) during the last 50–60 years. The simple structure approach is compared to the modern approach of sparse PCA where interpretable solutions are directly obtained. It is shown that their goals are identical but they differ by the way they are realized. Next, the most popular and influential methods for sparse PCA are briefly reviewed. In the remaining part of the paper, a new approach to define sparse PCA is introduced. Several alternative definitions are considered and illustrated on a well-known data set. Finally, it is demonstrated, how one of these possible versions of sparse PCA can be used as a sparse alternative to the classical rotation methods.  相似文献   

9.
Dimensionality reduction is an important technique in surrogate modeling and machine learning. In this article, we propose a supervised dimensionality reduction method, “least squares regression principal component analysis” (LSR-PCA), applicable to both classification and regression problems. To show the efficacy of this method, we present different examples in visualization, classification, and regression problems, comparing it with several state-of-the-art dimensionality reduction methods. Finally, we present a kernel version of LSR-PCA for problems where the inputs are correlated nonlinearly. The examples demonstrate that LSR-PCA can be a competitive dimensionality reduction method.  相似文献   

10.
This article compares two approaches in aggregating multiple inputs and multiple outputs in the evaluation of decision making units (DMUs), data envelopment analysis (DEA) and principal component analysis (PCA). DEA, a non-statistical efficiency technique, employs linear programming to weight the inputs/outputs and rank the performance of DMUs. PCA, a multivariate statistical method, combines new multiple measures defined by the inputs/outputs. Both methods are applied to three real world data sets that characterize the economic performance of Chinese cities and yield consistent and mutually complementary results. Nonparametric statistical tests are employed to validate the consistency between the rankings obtained from DEA and PCA.  相似文献   

11.
In practical applications related to, for instance, machine learning, data mining and pattern recognition, one is commonly dealing with noisy data lying near some low-dimensional manifold. A well-established tool for extracting the intrinsically low-dimensional structure from such data is principal component analysis (PCA). Due to the inherent limitations of this linear method, its extensions to extraction of nonlinear structures have attracted increasing research interest in recent years. Assuming a generative model for noisy data, we develop a probabilistic approach for separating the data-generating nonlinear functions from noise. We demonstrate that ridges of the marginal density induced by the model are viable estimators for the generating functions. For projecting a given point onto a ridge of its estimated marginal density, we develop a generalized trust region Newton method and prove its convergence to a ridge point. Accuracy of the model and computational efficiency of the projection method are assessed via numerical experiments where we utilize Gaussian kernels for nonparametric estimation of the underlying densities of the test datasets.  相似文献   

12.
This paper presents an effective and efficient kernel approach to recognize image set which is represented as a point on extended Grassmannian manifold. Several recent studies focus on the applicability of discriminant analysis on Grassmannian manifold and suffer from not obtaining the inherent nonlinear structure of the data itself. Therefore, we propose an extension of Grassmannian manifold to address this issue. Instead of using a linear data embedding with PCA, we develop a non-linear data embedding of such manifold using kernel PCA. This paper mainly consider three folds: 1) introduce a non-linear data embedding of extended Grassmannian manifold, 2) derive a distance metric of Grassmannian manifold, 3) develop an effective and efficient Grassmannian kernel for SVM classification. The extended Grassmannian manifold naturally arises in the application to recognition based on image set, such as face and object recognition. Experiments on several standard databases show better classification accuracy. Furthermore, experimental results indicate that our proposed approach significantly reduces time complexity in comparison to graph embedding discriminant analysis.  相似文献   

13.
Singular value decomposition (SVD) is a useful tool in functional data analysis (FDA). Compared to principal component analysis (PCA), SVD is more fundamental, because SVD simultaneously provides the PCAs in both row and column spaces. We compare SVD and PCA from the FDA view point, and extend the usual SVD to variations by considering different centerings. A generalized scree plot is proposed to select an appropriate centering in practice. Several useful matrix views of the SVD components are introduced to explore different features in data, including SVD surface plots, image plots, curve movies, and rotation movies. These methods visualize both column and row information of a two-way matrix simultaneously, relate the matrix to relevant curves, show local variations, and highlight interactions between columns and rows. Several toy examples are designed to compare the different variations of SVD, and real data examples are used to illustrate the usefulness of the visualization methods.  相似文献   

14.
In this article, we propose a new framework for matrix factorization based on principal component analysis (PCA) where sparsity is imposed. The structure to impose sparsity is defined in terms of groups of correlated variables found in correlation matrices or maps. The framework is based on three new contributions: an algorithm to identify the groups of variables in correlation maps, a visualization for the resulting groups, and a matrix factorization. Together with a method to compute correlation maps with minimum noise level, referred to as missing-data for exploratory data analysis (MEDA), these three contributions constitute a complete matrix factorization framework. Two real examples are used to illustrate the approach and compare it with PCA, sparse PCA, and structured sparse PCA. Supplementary materials for this article are available online.  相似文献   

15.
将一种基于特征提取的ε-不灵敏支持向量机方法用于非线性系统辨识.对输入输出数据首先进行核主元特征提取,将特征提取后的数据作为支持向量机的训练数据.将该方法与基于主元特征提取的方法和直接应用ε-不灵敏支持向量机的方法进行含噪和不含噪情况下的仿真比较,结果表明,方法的拟合性能和抗干扰能力优于其他两种方法.  相似文献   

16.
对主成分分析法三个问题的剖析   总被引:3,自引:0,他引:3  
从主成分分析法的基本原理入手,针对教学过程中学生对主成分分析法感到费解的三个问题进行了逐一剖析:1.为什么主成分系数是经标准差标准化后原始变量的协方差矩阵的特征向量?2.特征向量正负号如何选取?对进一步的研究如计算综合得分和聚类分析有何影响?3.主成分载荷值是如何得来的?同时指出有些教材在计算主成分得分时混淆了主成分载荷和特征向量的概念,以致造成错误的结果.  相似文献   

17.
Patient experience and satisfaction surveys have been adopted worldwide to evaluate healthcare quality. Nevertheless, national governments and the general public continue to search for optimal methods to assess healthcare quality from the patient’s perspective. This study proposes a new hybrid method, which combines principal component analysis (PCA) and the evidential reasoning (ER) approach, for assessing patient satisfaction. PCA is utilized to transform correlated items into a few uncorrelated principal components (PCs). Then, the ER approach is employed to aggregate extracted PCs, which are considered as multiple attributes or criteria within the ER framework. To compare the performance of the proposed method with that of another assessment method, analytic hierarchy process (AHP) is employed to acquire the weight of each assessment item in the hierarchical assessment framework, and the ER approach is used to aggregate patient evaluation for each item. Compared with the combined AHP and ER approach, which relies on the respondents’ subjective judgments to calculate criterion and subcriterion weights in the assessment framework, the proposed method is highly objective and completely based on survey data. This study contributes a novel and innovative hybrid method that can help hospital administrators obtain an objective and aggregated healthcare quality assessment based on patient experience.  相似文献   

18.
We develop time series analysis of functional data observed discretely, treating the whole curve as a random realization from a distribution on functions that evolve over time. The method consists of principal components analysis of functional data and subsequently modeling the principal component scores as vector autoregressive moving averag (VARMA) process. We justify the method by showing that an underlying ARMAH structure of the curves leads to a VARMA structure on the principal component scores. We derive asymptotic properties of the estimators, fits, and forecast. For term structures of interest rates, these provide a unified framework for studying the time and maturity components of interest rates under one setup with few parametric assumptions. We apply the method to the yield curves of USA and India. We compare our forecasts to the parametric model that is based on Nelson‐Siegel curves. In another application, we study the dependence of long term interest rate on the short term interest rate using functional regression.  相似文献   

19.
基于误差理论的区间主成分分析及其应用   总被引:1,自引:0,他引:1  
针对区间数样本,传统的主成分分析需进行拓展。首先讨论了区间样本数据的两种主要来源,即观测误差和符号数据分析。然后将区间数看作一个由中点和半径构成的具有一定误差的数,从误差理论出发,研究基于误差传递公式的区间主成分分析方法,并获得以区间数为表达形式的主成分。最后,结合我国2005年第四季度股票市场的数据进行了实证分析。结果表明,面对海量数据,区间PCA较传统PCA更容易从总体上把握样本的属性。  相似文献   

20.
Principal component analysis (PCA) is often used to visualize data when the rows and the columns are both of interest. In such a setting, there is a lack of inferential methods on the PCA output. We study the asymptotic variance of a fixed-effects model for PCA, and propose several approaches to assessing the variability of PCA estimates: a method based on a parametric bootstrap, a new cell-wise jackknife, as well as a computationally cheaper approximation to the jackknife. We visualize the confidence regions by Procrustes rotation. Using a simulation study, we compare the proposed methods and highlight the strengths and drawbacks of each method as we vary the number of rows, the number of columns, and the strength of the relationships between variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号