首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Principal component analysis (PCA) is an important tool for dimension reduction in multivariate analysis. Regularized PCA methods, such as sparse PCA and functional PCA, have been developed to incorporate special features in many real applications. Sometimes additional variables (referred to as supervision) are measured on the same set of samples, which can potentially drive low-rank structures of the primary data of interest. Classical PCA methods cannot make use of such supervision data. In this article, we propose a supervised sparse and functional principal component (SupSFPC) framework that can incorporate supervision information to recover underlying structures that are more interpretable. The framework unifies and generalizes several existing methods and flexibly adapts to the practical scenarios at hand. The SupSFPC model is formulated in a hierarchical fashion using latent variables. We develop an efficient modified expectation-maximization (EM) algorithm for parameter estimation. We also implement fast data-driven procedures for tuning parameter selection. Our comprehensive simulation and real data examples demonstrate the advantages of SupSFPC. Supplementary materials for this article are available online.  相似文献   

2.
基于灰色关联分析和主成分分析组合权重的确定方法研究   总被引:3,自引:0,他引:3  
为了提高综合评价模型中权重的准确性和客观性,以主成分分析法和灰色关联分析法为基础,建立了组合权重模型,从而修正了灰色关联分析法和主成分分析法在确定权重过程中的缺陷,为综合评价体系中权重的确定提出一种新的思路.  相似文献   

3.
Given a set of signals, a classical construction of an optimal truncatable basis for optimally representing the signals, is the principal component analysis (PCA for short) approach. When the information about the signals one would like to represent is a more general property, like smoothness, a different basis should be considered. One example is the Fourier basis which is optimal for representation smooth functions sampled on regular grid. It is derived as the eigenfunctions of the circulant Laplacian operator. In this paper, based on the optimality of the eigenfunctions of the Laplace-Beltrami operator (LBO for short), the construction of PCA for geometric structures is regularized. By assuming smoothness of a given data, one could exploit the intrinsic geometric structure to regularize the construction of a basis by which the observed data is represented. The LBO can be decomposed to provide a representation space optimized for both internal structure and external observations. The proposed model takes the best from both the intrinsic and the extrinsic structures of the data and provides an optimal smooth representation of shapes and forms.  相似文献   

4.
提出一种树叶分类方法.在数据方面,所获得数据既包含树叶的图形信息数据,也包含树叶的纹理信息.在前期数据预处理阶段,采用主成分分析方法对原始数据进行降维处理,从16个特征中提取出3个主成分,且累计主成分贡献率达到85%以上.在后期数据分析处理阶段,用支持向量机对树叶数据进行分类预测,并用粒子群算法对支持向量机参数进行寻优处理,提高分类精度.实验结果表明,相对于遗传算法和网格搜索法寻到的最优参数相比,粒子群算法优化支持向量机具有最高的准确率,高达94.1%,高于其他两种分类方法.  相似文献   

5.
In this article, we propose a new framework for matrix factorization based on principal component analysis (PCA) where sparsity is imposed. The structure to impose sparsity is defined in terms of groups of correlated variables found in correlation matrices or maps. The framework is based on three new contributions: an algorithm to identify the groups of variables in correlation maps, a visualization for the resulting groups, and a matrix factorization. Together with a method to compute correlation maps with minimum noise level, referred to as missing-data for exploratory data analysis (MEDA), these three contributions constitute a complete matrix factorization framework. Two real examples are used to illustrate the approach and compare it with PCA, sparse PCA, and structured sparse PCA. Supplementary materials for this article are available online.  相似文献   

6.
In many atmospheric and earth sciences, it is of interest to identify dominant spatial patterns of variation based on data observed at p locations and n time points with the possibility that p > n. While principal component analysis (PCA) is commonly applied to find the dominant patterns, the eigenimages produced from PCA may exhibit patterns that are too noisy to be physically meaningful when p is large relative to n. To obtain more precise estimates of eigenimages, we propose a regularization approach incorporating smoothness and sparseness of eigenimages, while accounting for their orthogonality. Our method allows data taken at irregularly spaced or sparse locations. In addition, the resulting optimization problem can be solved using the alternating direction method of multipliers, which is easy to implement, and applicable to a large spatial dataset. Furthermore, the estimated eigenfunctions provide a natural basis for representing the underlying spatial process in a spatial random-effects model, from which spatial covariance function estimation and spatial prediction can be efficiently performed using a regularized fixed-rank kriging method. Finally, the effectiveness of the proposed method is demonstrated by several numerical examples.  相似文献   

7.
8.
Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online.  相似文献   

9.
糖基化是蛋白质翻译后修饰的重要形式之一,氧链糖基化是糖基化的一种主要类型,对蛋白质氧链糖基化位点进行预测具有重要的意义.以窗口长度为41的蛋白质序列为研究对象,采用稀疏编码,利用主成分分析法研究了氧链糖基化蛋白质序列的结构特点;在提取主成分的基础上,设计了一个含单隐层的BP神经网络(256—8—4),对蛋白质氧链糖基化位点进行预测,把蛋白质序列分为4类;并同直接用BP神经网络分类的结果相比较,实验结果证明提出的方法省时,准确,预测的准确率达80~90%.  相似文献   

10.
We propose fast and scalable statistical methods for the analysis of hundreds or thousands of high-dimensional vectors observed at multiple visits. The proposed inferential methods do not require loading the entire dataset at once in the computer memory and instead use only sequential access to data. This allows deployment of our methodology on low-resource computers where computations can be done in minutes on extremely large datasets. Our methods are motivated by and applied to a study where hundreds of subjects were scanned using Magnetic Resonance Imaging (MRI) at two visits roughly five years apart. The original data possess over ten billion measurements. The approach can be applied to any type of study where data can be unfolded into a long vector including densely observed functions and images. Supplemental materials are provided with source code for simulations, some technical details and proofs, and additional imaging results of the brain study.  相似文献   

11.
主成份投影法在证券市场上市公司统计分析中的应用   总被引:7,自引:0,他引:7  
本文利用主成份投影法 ,对证券市场上 4 5家个股业绩的九个指标进行了综合评价 ,指标的权重的确定采用了熵权法 ,得到了这些公司 2 0 0 0年的财务状况的排列次序 .  相似文献   

12.
环境质量的主成分分析   总被引:29,自引:2,他引:29  
主成分分析法能够在保证原始数据信息损失最小的情况下 ,以少数的综合变量取代原有的多维变量 ,使数据结构大为简化 ,并且客观地确定权数 ,避免了主观随意性 ,因而是环境质量综合评价的一种简单易行的有效方法 .通过主成分分析 ,可以为环境质量的分区和分级治理提供重要的理论依据  相似文献   

13.
Motivated by statistical learning theoretic treatment of principal component analysis, we are concerned with the set of points in ℝ d that are within a certain distance from a k-dimensional affine subspace. We prove that the VC dimension of the class of such sets is within a constant factor of (k+1)(dk+1), and then discuss the distribution of eigenvalues of a data covariance matrix by using our bounds of the VC dimensions and Vapnik’s statistical learning theory. In the course of the upper bound proof, we provide a simple proof of Warren’s bound of the number of sign sequences of real polynomials.  相似文献   

14.
在科学研究中,经常利用观测到的数据研究复杂系统的主要成分与观测变量的关系,这实际上是科学的最基础问题之一,称为主成分分析.对主成分分析的优良性研究文章众多,但是由于东西方认知世界的哲学思想不同,所以历史上东西方对主成分分析的计算和论证方法有着相当大的差异.利用对称设计对数据进行分类,通过对东西方主成分分析的计算方法进行比较,说明东方象数学的主成分分析的计算方法具有再现性,而西方主成分分析的计算方法不具有再现性.从再现性的观点来看,东方象数学的主成分分析的计算方法科学性更强.  相似文献   

15.
首先在前人研究的经典文献基础土选出有关金融安全指标体系的28个基础指标,并且根据当前经济金融发展形势和"十三五"规划战略导向,有选择地添加13个指标,构建了一套新的涵盖宏观经济安全、中观金融机构安全、微观金融市场安全、外部金融安全和金融软环境安全5个维度、41个指标的金融安全指标体系;然后根据这41个指标,运用加权主成分距离聚类分析法对中国金融安全2000-2014年的时间段进行聚类,分析2000-2014年这期间的金融安全影响程度和该期间的金融安全状况分别受哪些主成分的影响较大;最后,对中国金融安全指标的主成分进行综合评价,为国家或企业如何防范金融风险,保障金融安全提供参考建议.  相似文献   

16.
This article presents and compares two approaches of principal component (PC) analysis for two-dimensional functional data on a possibly irregular domain. The first approach applies the singular value decomposition of the data matrix obtained from a fine discretization of the two-dimensional functions. When the functions are only observed at discrete points that are possibly sparse and may differ from function to function, this approach incorporates an initial smoothing step prior to the singular value decomposition. The second approach employs a mixed effects model that specifies the PC functions as bivariate splines on triangulations and the PC scores as random effects. We apply the thin-plate penalty for regularizing the function estimation and develop an effective expectation–maximization algorithm for calculating the penalized likelihood estimates of the parameters. The mixed effects model-based approach integrates scatterplot smoothing and functional PC analysis in a unified framework and is shown in a simulation study to be more efficient than the two-step approach that separately performs smoothing and PC analysis. The proposed methods are applied to analyze the temperature variation in Texas using 100 years of temperature data recorded by Texas weather stations. Supplementary materials for this article are available online.  相似文献   

17.
建立学生评价体系的主成分分析模型,随机抽取中国计量学院理学院部分学生,采集实际数据,进行实例分析.计算结果显示,模型有效.  相似文献   

18.
关于用主成分分析做综合评价的若干问题   总被引:50,自引:4,他引:50  
阎慈琳.关于用主成分分析做综合评价的若干问题.数理统计与管理,1998,17(2),22~25.探讨了特征向量的方向对用主成分分析做综合评价的影响,并提出改正意见  相似文献   

19.
In this paper, we consider a technique called the generic Principal Component Analysis (PCA) which is based on an extension and rigorous justification of the standard PCA. The generic PCA is treated as the best weighted linear estimator of a given rank under the condition that the associated covariance matrix is singular. As a result, the generic PCA is constructed in terms of the pseudo-inverse matrices that imply a development of the special technique. In particular, we give a solution of the new low-rank matrix approximation problem that provides a basis for the generic PCA. Theoretical aspects of the generic PCA are carefully studied.  相似文献   

20.
This article proposes a new approach to principal component analysis (PCA) for interval-valued data. Unlike classical observations, which are represented by single points in p-dimensional space ?p, interval-valued observations are represented by hyper-rectangles in ?p, and as such, have an internal structure that does not exist in classical observations. As a consequence, statistical methods for classical data must be modified to account for the structure of the hyper-rectangles before they can be applied to interval-valued data. This article extends the classical PCA method to interval-valued data by using the so-called symbolic covariance to determine the principal component (PC) space to reflect the total variation of interval-valued data. The article also provides a new approach to constructing the observations in a PC space for better visualization. This new representation of the observations reflects their true structure in the PC space. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号