Affiliation: | a Department of Physical Sciences, University of Helsinki, Box 64, FIN-00014, Helsinki, Finland b Department of Chemical Engineering, Clarkson University, Box 5705, Potsdam, NY 13699-5705, USA |
Abstract: | This work examines the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables). Such matrices often occur when measuring elemental concentrations in environmental samples. In the strongest variables, the error level may be a few percent. For the weakest variables, the data may consist almost entirely of noise. This paper demonstrates that the proper scaling of weak variables is critical. It is found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level. The mathematical explanation of this phenomenon is explored by means of Givens rotations. It is shown that the customary form of principal component analysis (PCA), based on autoscaling the original data, is generally very ineffective because the scaling of weak variables becomes much too high. Practical advice is given for dealing with noisy data in both PCA and positive matrix factorization (PMF). |