首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Discarding or downweighting high-noise variables in factor analytic models
Authors:Pentti Paatero  Philip K Hopke  
Institution:

a Department of Physical Sciences, University of Helsinki, Box 64, FIN-00014, Helsinki, Finland

b Department of Chemical Engineering, Clarkson University, Box 5705, Potsdam, NY 13699-5705, USA

Abstract:This work examines the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables). Such matrices often occur when measuring elemental concentrations in environmental samples. In the strongest variables, the error level may be a few percent. For the weakest variables, the data may consist almost entirely of noise. This paper demonstrates that the proper scaling of weak variables is critical. It is found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level. The mathematical explanation of this phenomenon is explored by means of Givens rotations. It is shown that the customary form of principal component analysis (PCA), based on autoscaling the original data, is generally very ineffective because the scaling of weak variables becomes much too high. Practical advice is given for dealing with noisy data in both PCA and positive matrix factorization (PMF).
Keywords:Principal component analysis  Positive matrix factorization  Signal-to-noise  Scaling of variables  Autoscaling  Weak variables  Givens rotations
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号