共查询到20条相似文献,搜索用时 12 毫秒
1.
We produce approximation bounds on a semidefinite programming relaxation for sparse principal component analysis. The sparse maximum eigenvalue problem cannot be efficiently approximated up to a constant approximation ratio, so our bounds depend on the optimum value of the semidefinite relaxation: the higher this value, the better the approximation. In particular, these bounds allow us to control approximation ratios for tractable statistics in hypothesis testing problems where data points are sampled from Gaussian models with a single sparse leading component. 相似文献
2.
Alessio Farcomeni 《Computational Statistics》2009,24(4):583-604
We show a branch and bound approach to exactly find the best sparse dimension reduction of a matrix. We can choose between
enforcing orthogonality of the coefficients and uncorrelation of the components, and can explicitly set the degree of sparsity.
We suggest methods to choose the number of non-zero loadings for each component; illustrate and compare our approach with
existing methods through a benchmark data set. 相似文献
3.
Principal component analysis (PCA) is a widely used technique for data analysis and dimension reduction with numerous applications in science and engineering. However, the standard PCA suffers from the fact that the principal components (PCs) are usually linear combinations of all the original variables, and it is thus often difficult to interpret the PCs. To alleviate this drawback, various sparse PCA approaches were proposed in the literature (Cadima and Jolliffe in J Appl Stat 22:203–214, 1995; d’Aspremont et?al. in J Mach Learn Res 9:1269–1294, 2008; d’Aspremont et?al. SIAM Rev 49:434–448, 2007; Jolliffe in J Appl Stat 22:29–35, 1995; Journée et?al. in J Mach Learn Res 11:517–553, 2010; Jolliffe et?al. in J Comput Graph Stat 12:531–547, 2003; Moghaddam et?al. in Advances in neural information processing systems 18:915–922, MIT Press, Cambridge, 2006; Shen and Huang in J Multivar Anal 99(6):1015–1034, 2008; Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006). Despite success in achieving sparsity, some important properties enjoyed by the standard PCA are lost in these methods such as uncorrelation of PCs and orthogonality of loading vectors. Also, the total explained variance that they attempt to maximize can be too optimistic. In this paper we propose a new formulation for sparse PCA, aiming at finding sparse and nearly uncorrelated PCs with orthogonal loading vectors while explaining as much of the total variance as possible. We also develop a novel augmented Lagrangian method for solving a class of nonsmooth constrained optimization problems, which is well suited for our formulation of sparse PCA. We show that it converges to a feasible point, and moreover under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the augmented Lagrangian subproblems, and establish their global and local convergence. Finally, we compare our sparse PCA approach with several existing methods on synthetic (Zou et?al. in J Comput Graph Stat 15(2):265–286, 2006), Pitprops (Jeffers in Appl Stat 16:225–236, 1967), and gene expression data (Chin et?al in Cancer Cell 10:529C–541C, 2006), respectively. The computational results demonstrate that the sparse PCs produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. Moreover, the experiments on random data show that our method is capable of solving large-scale problems within a reasonable amount of time. 相似文献
4.
5.
We present the basic idea of abstract principal component analysis(APCA)as a general approach that extends various popular data analysis techniques such as PCA and GPCA.We describe the mathematical theory behind APCA and focus on a particular application to mode extractions from a data set of mixed temporal and spatial signals.For illustration,algorithmic implementation details and numerical examples are presented for the extraction of a number of basic types of wave modes including,in particular,dynamic modes involving spatial shifts. 相似文献
6.
Sparse principal component analysis (PCA), an important variant of PCA, attempts to find sparse loading vectors when conducting dimension reduction. This paper considers the nonsmooth Riemannian optimization problem associated with the ScoTLASS model for sparse PCA which can impose orthogonality and sparsity simultaneously. A Riemannian proximal method is proposed in the work of Chen et al. for the efficient solution of this optimization problem. In this paper, two acceleration schemes are introduced. First and foremost, we extend the FISTA method from the Euclidean space to the Riemannian manifold to solve sparse PCA, leading to the accelerated Riemannian proximal gradient method. Since the Riemannian optimization problem for sparse PCA is essentially nonconvex, a restarting technique is adopted to stabilize the accelerated method without sacrificing the fast convergence. Second, a diagonal preconditioner is proposed for the Riemannian proximal subproblem which can further accelerate the convergence of the Riemannian proximal methods. Numerical evaluations establish the computational advantages of the proposed methods over the existing proximal gradient methods on a manifold. Additionally, a short result concerning the convergence of the Riemannian subgradients of a sequence is established, which, together with the result in the work of Chen et al., can show the stationary point convergence of the Riemannian proximal methods. 相似文献
7.
F. H. Ruymgaart 《Journal of multivariate analysis》1981,11(4):485-497
A robust principal component analysis for samples from a bivariate distribution function is described. The method is based on robust estimators for dispersion in the univariate case along with a certain linearization of the bivariate structure. Besides the continuity of the functional defining the direction of the suitably modified principal axis, we prove consistency of the corresponding sequence of estimators. Asymptotic normality is established under some additional conditions. 相似文献
8.
In this paper, we apply orthogonally equivariant spatial sign covariance matrices as well as their affine equivariant counterparts in principal component analysis. The influence functions and asymptotic covariance matrices of eigenvectors based on robust covariance estimators are derived in order to compare the robustness and efficiency properties. We show in particular that the estimators that use pairwise differences of the observed data have very good efficiency properties, providing practical robust alternatives to classical sample covariance matrix based methods. 相似文献
9.
Summary In the present paper empirical influence functions (EIFs) are derived for eigenvalues and eigenfunctions in functional principal
component analysis in both cases where the smoothing parameter is fixed and unfixed. Based on the derived influence functions
a sensitivity analysis procedure is proposed for detecting jointly as well as singly influential observations. A numerical
example is given to show the usefulness of the proposed procedure. In dealing with the influence on the eigenfunctions two
different kinds of influence statistics are introduced. One is based on the EIF for the coefficient vectors of the basis function
expansion, and the other is based on the sampled vectors of the functional EIF. Under a certain condition it can be proved
both kinds of statistics provide essentially equivalent results. 相似文献
10.
Julie Josse J��r?me Pag��s Fran?ois Husson 《Advances in Data Analysis and Classification》2011,5(3):231-246
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set. 相似文献
11.
Functional principal component analysis is the preliminary step to represent the data in a lower dimensional space and to
capture the main modes of variability of the data by means of small number of components which are linear combinations of
original variables. Sensitivity of the variance and the covariance functions to irregular observations make this method vulnerable
to outliers and may not capture the variation of the regular observations. In this study, we propose a robust functional principal
component analysis to find the linear combinations of the original variables that contain most of the information, even if
there are outliers and to flag functional outliers. We demonstrate the performance of the proposed method on an extensive
simulation study and two datasets from chemometrics and environment. 相似文献
12.
Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map. 相似文献
13.
14.
In this paper, we propose auto-associative (AA) models to generalize Principal component analysis (PCA). AA models have been introduced in data analysis from a geometrical point of view. They are based on the approximation of the observations scatter-plot by a differentiable manifold. In this paper, they are interpreted as Projection pursuit models adapted to the auto-associative case. Their theoretical properties are established and are shown to extend the PCA ones. An iterative algorithm of construction is proposed and its principle is illustrated both on simulated and real data from image analysis. 相似文献
15.
Classically principal component analysis is one of the most used techniques for exploring the multivariate association pattern of variables. On the other hand, conditioning is one of the most promising ideas for controlling the variability of observed data. Here we present a review of some conditioning methods from the analysis of residuals of a parametric model to the analysis of the local variation defined by means of a non‐oriented graph of individuals, this variation being defined from the deviation from a local mean or alternatively from the differences among contiguous vertices. We will compare these approaches and will show that under some conditions they give comparable results. Finally, we will present an example of application to illustrate the results previously stated. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献
16.
Computing estimates in functional principal component analysis (FPCA) from discrete data is usually based on the approximation
of sample curves in terms of a basis (splines, wavelets, trigonometric functions, etc.) and a geometrical structure in the
data space (L
2 spaces, Sobolev spaces, etc.). Until now, the computational efforts have been focused in developing ad hoc algorithms to
approximate those estimates by previously selecting an efficient approximating technique and a convenient geometrical structure.
The main goal of this paper consists of establishing a procedure to formulate the algorithm for computing estimates of FPCA
under general settings. The resulting algorithm is based on the classic multivariate PCA of a certain random vector and can
thus be implemented in the majority of statistical packages. In fact, it is derived from the analysis of the effects of modifying
the norm in the space of coordinates. Finally, an application on real data will be developed to illustrate the so derived
theoretic results.
This research has been supported by Project MTM2004-5992 from Dirección General de Investigación, Ministerio de Ciencia y
Tecnología. 相似文献
17.
For Principal Component Analysis in Reproducing Kernel Hilbert Spaces (KPCA), optimization over sets containing only linear
combinations of all n-tuples of kernel functions is investigated, where n is a positive integer smaller than the number of data. Upper bounds on the accuracy in approximating the optimal solution,
achievable without restrictions on the number of kernel functions, are derived. The rates of decrease of the upper bounds
for increasing number n of kernel functions are given by the summation of two terms, one proportional to n
−1/2 and the other to n
−1, and depend on the maximum eigenvalue of the Gram matrix of the kernel with respect to the data. Primal and dual formulations
of KPCA are considered. The estimates provide insights into the effectiveness of sparse KPCA techniques, aimed at reducing
the computational costs of expansions in terms of kernel units. 相似文献
18.
C. C. Chang 《BIT Numerical Mathematics》1988,28(2):205-214
This paper is concerned with the allocation of multi-attribute records on several disks so as to achieve high degree of concurrency of disk access when responding to partial match queries.An algorithm to distribute a set of multi-attribute records onto different disks is presented. Since our allocation method will use the principal component analysis, this concept is first introduced. We then use it to generate a set of real numbers which are the projections on the first principal component direction and can be viewed as hashing addresses.Then we propose an algorithm based upon these hashing addresses to allocate multi-attribute records onto different disks. Some experimental results show that our method can indeed be used to solve the multi-disk data allocation problem for concurrent accessing. 相似文献
19.
Antoine Saucier 《Applied and Computational Harmonic Analysis》2005,18(3):300-328
We show that orthonormal bases of functions with multiscale compact supports can be obtained from a generalization of principal component analysis. These functions, called multiscale principal components (MPCs), are eigenvectors of the correlation operator expressed in different vector subspaces. MPCs are data-adaptive functions that minimize their correlation with the reference signal. Using MPCs, we construct orthogonal bases which are similar to dyadic wavelet bases. We observe that MPCs are natural wavelets, i.e. their average is zero or nearly zero if the signal has a dominantly low-pass spectrum. We show that MPCs perform well in simple data compression experiments, in the presence or absence of singularities. We also introduce concentric MPCs, which are orthogonal basis functions having multiscale concentric supports. Use as kernels in convolution products with a signal, these functions allow to define a wavelet transform that has a striking capacity to emphasize atypical patterns. 相似文献
20.
Kohei Adachi 《Advances in Data Analysis and Classification》2011,5(1):23-36
Principal component analysis (PCA) of an objects × variables data matrix is used for obtaining a low-dimensional biplot configuration, where data are approximated by the inner products of the vectors corresponding to objects and variables. Borg and Groenen (Modern multidimensional scaling. Springer, New York, 1997) have suggested another biplot procedure which uses a technique for approximating data by projections of object vectors on variable vectors. This technique is formulated as constraining the variable vectors in PCA to be of unit length and can be called unit-length vector analysis (UVA). However, an algorithm for UVA has not yet been developed. In this paper, we present such an algorithm, discuss the properties of UVA solutions, and demonstrate the advantage of UVA in biplots for standardized data with homogeneous variances among variables. The advantage of UVA-based biplots is that the projections of object vectors onto variable vectors express the approximation of data in an easy way, while in PCA-based biplots we must consider not only the projections, but also the lengths of variable vectors in order to visualize approximations. 相似文献