首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sârbu C  Pop HF 《Talanta》2005,65(5):1215-1220
Principal component analysis (PCA) is a favorite tool in environmetrics for data compression and information extraction. PCA finds linear combinations of the original measurement variables that describe the significant variations in the data. However, it is well-known that PCA, as with any other multivariate statistical method, is sensitive to outliers, missing data, and poor linear correlation between variables due to poorly distributed variables. As a result data transformations have a large impact upon PCA. In this regard one of the most powerful approach to improve PCA appears to be the fuzzification of the matrix data, thus diminishing the influence of the outliers. In this paper we discuss and apply a robust fuzzy PCA algorithm (FPCA). The efficiency of the new algorithm is illustrated on a data set concerning the water quality of the Danube River for a period of 11 consecutive years. Considering, for example, a two component model, FPCA accounts for 91.7% of the total variance and PCA accounts only for 39.8%. Much more, PCA showed only a partial separation of the variables and no separation of scores (samples) onto the plane described by the first two principal components, whereas a much sharper differentiation of the variables and scores is observed when FPCA is applied.  相似文献   

2.
The main objective of this paper is to introduce principal component analysis and two robust fuzzy principal component algorithms as useful tools in characterizing and comparing rime samples collected in different locations in Poland (2004–2007). The efficiency of the applied procedures was illustrated on a data set containing 108 rime samples and concentration of anions, cations, HCHO, as well as pH and conductivity. The fuzzy principal component algorithms achieved better results mainly because they are more compressible than classical PCA and very robust to outliers. For example, a three component model, fuzzy principal component analysis-first component (FPCA-1) accounts for 62.37% of the total variance and fuzzy principal component analysis-orthogonal (FPCA-o) 90.11%; PCA accounts only for 58.30%. The first two principal components explain 51.41% of the total variance in the case of FPCA-1 and 79.59% in the case of FPCA-o as compared to only 47.55% for PCA. As a direct consequence, PCA showed only a partial differentiation of rime samples onto the plane or in the space described by different combination of two or three principal components, whereas a much sharper differentiation of the samples, regarding their origin and location, is observed when FPCAs are applied.   相似文献   

3.
Xu CJ  Liang YZ  Li Y  Du YP 《The Analyst》2003,128(1):75-81
Some kinds of chemical data are not only univariate or multivariate observations of classical statistics, but also functions observed continuously. Such special characters of the data, if being handled efficiently, will certainly improve the predictive accuracy. In this paper, a novel method, named noise perturbation in functional principal component analysis (NPFPCA), was proposed to determine the chemical rank of two-way data. In NPFPCA, after noise addition to the measured data, the smooth eigenvectors can be obtained by functional principal component analysis (FPCA). The eigenvectors representing noise are sensitive to the perturbation, on the other hand, those representing chemical components are not. Therefore, by comparing the difference of eigenvectors obtained by FPCA with noise perturbation and by traditional principal component analysis (PCA), the chemical rank of the system can be achieved accurately. Several simulated and real chemical data sets were analyzed to demonstrate the efficiency of the proposed method.  相似文献   

4.
一个基于诊断的稳健主成分分析方法   总被引:1,自引:0,他引:1  
经典的主成分分析方法易受异常点影响。本文根据该方法的特点,提出一新的诊断方法,将多变量数据中异常剔除后再进行主成分分析,构成有效的稳健主成分分析法。用此法处理二组实际数据,结果令人满意。  相似文献   

5.
The impact of a phosphatic fertilizer plant on the adjacent environment was examined. Selected rare earth elements, heavy metals and metalloids were determined in substrates and products, waste by-product, and grass and soil samples. Concentration gradients of elements in grass and soil samples along the southerly and easterly directions were examined and compared with the content of interior soil and grass samples, substrates, and products. Results were compared with available data on soil permissible element concentration levels. Two fuzzy principal component analysis (FPCA) methods for robust estimation of principal components were applied and compared with classical PCA. The efficiency of the new algorithms is illustrated. The investigation explored the impact of the plant on the adjacent environment. The most reliable results, in good agreement with types of samples, were produced using the FPCA-O algorithm  相似文献   

6.
Gas chromatograms of fatty acid methyl esters and of volatile lipid oxidation products from fish lipid extracts are analyzed by multivariate data analysis [principal component analysis (PCA)]. Peak alignment is necessary in order to include all sampled points of the chromatograms in the data set. The ability of robust algorithms to deal with outlier problems, including both sample-wise and element-wise outliers, and the advantages and drawbacks of two robust PCA methods, robust PCA (ROBPCA) and robust singular value decomposition when analysing these GC data were investigated. The results show that the usage of ROPCA is advantageous, compared with traditional PCA, when analysing the entire profile of chromatographic data in cases of sub-optimally aligned data. It also demonstrates how choosing the most robust PCA (sample or element-wise) depends on the type of outliers present in the data set.  相似文献   

7.
Partial least squares (PLS) regression is a linear regression technique developed to relate many regressors to one or several response variables. Robust methods are introduced to reduce or remove the effect of outlying data points. In this paper, we show that if the sample covariance matrix is properly robustified further robustification of the linear regression steps of the PLS algorithm becomes unnecessary. The robust estimate of the covariance matrix is computed by searching for outliers in univariate projections of the data on a combination of random directions (Stahel—Donoho) and specific directions obtained by maximizing and minimizing the kurtosis coefficient of the projected data, as proposed by Peña and Prieto [1]. It is shown that this procedure is fast to apply and provides better results than other methods proposed in the literature. Its performance is illustrated by Monte Carlo and by an example, where the algorithm is able to show features of the data which were undetected by previous methods. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
Observed data often belong to some specific intervals of values (for instance in case of percentages or proportions) or are higher (lower) than pre‐specified values (for instance, chemical concentrations are higher than zero). The use of classical principal component analysis (PCA) may lead to extract components such that the reconstructed data take unfeasible values. In order to cope with this problem, a constrained generalization of PCA is proposed. The new technique, called bounded principal component analysis (B‐PCA), detects components such that the reconstructed data are constrained to belong to some pre‐specified bounds. This is done by implementing a row‐wise alternating least squares (ALS) algorithm, which exploits the potentialities of the least squares with inequality (LSI) algorithm. The results of a simulation study and two applications to bounded data are discussed for evaluating how the method and the algorithm for solving it work in practice. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

9.
10.
Principal component analysis (PCA) is widely used as an exploratory data analysis tool in the field of vibrational spectroscopy, particularly near-infrared (NIR) spectroscopy. PCA represents original spectral data containing large variables into a few feature-containing variables, or scores. Although multiple spectral ranges can be simultaneously used for PCA, only one series of scores generated by merging the selected spectral ranges is generally used for qualitative analysis. Alternatively, the combined use of an independent series of scores generated from separate spectral ranges has not been exploited.The aim of this study is to evaluate the use of PCA to discriminate between two geographical origins of sesame samples, when scores independently generated from separate spectral ranges are optimally combined. An accurate and rapid analytical method to determine the origin is essentially required for the correct value estimation and proper production distribution. Sesame is chosen in this study because it is difficult to visually discriminate the geographical origins and its composition is highly complex. For this purpose, we collected diffuse reflectance near-infrared (NIR) spectroscopic data from geographically diverse sesame samples over a period of eight years. The discrimination error obtained by applying linear discriminant analysis (LDA) was improved when separate scores from two spectral ranges were optimally combined, compared to the discrimination errors obtained when scores from singly merged two spectral ranges were used.  相似文献   

11.
In this outline of new approaches to multivariate calibration in chemistry the following topics are treated: Advantages of multivariate calibration over conventional univariate calibration: detect and eliminate selectivity problems. Multivariate calibration methods based on selection of some variables vs. methods based on data compression of all the variables. Direct vs. indirect calibration: pure constituents or known samples for calibration? Calibration methods based on data compression by physical modelling: Beer's law. Use of Beer's law in controlled and natural calibration: the generalized least-squares fit and the best linear predictor. Extending Beer's law to handle unknown selectivity problems. Calibration methods based on data compression by factor modelling: the principal component regression and partial least-squares regression. Methods for detecting abnormal samples (outliers). Pre-treatments to linearize data.  相似文献   

12.
A journey into low-dimensional spaces with autoassociative neural networks   总被引:4,自引:0,他引:4  
Daszykowski M  Walczak B  Massart DL 《Talanta》2003,59(6):1095-1105
The compression and the visualization of the data have been always a subject of a great deal of excitement. Since multidimensional data sets are difficult to interpret and visualize, much of the attention is drawn how to compress them efficiently. Usually, the compression of dimensionality is considered as the first step of exploratory data analysis. Here, we focus our attention on autoassociative neural networks (ANNs), which in a very elegant manner provide data compression and visualization. ANNs can deal with linear and nonlinear correlation among variables, what makes them a very powerful tool in exploratory data analysis. In the literature, ANNs are often referred as nonlinear principal component analysis (PCA), and due to their specific structure they are also known as bottleneck neural networks. In this paper, ANNs are discussed in details. Different training modes are described and illustrated on real example. The usefulness of ANNs for nonlinear data compression and visualization purposes is proven with the aid of chemical data sets, being the subject of analysis. The comparison of ANNs with well-known PCA is also presented.  相似文献   

13.
ChemCam is a remote laser-induced breakdown spectroscopy (LIBS) instrument that will arrive on Mars in 2012, on-board the Mars Science Laboratory Rover. The LIBS technique is crucial to accurately identify samples and quantify elemental abundances at various distances from the rover. In this study, we compare different linear and nonlinear multivariate techniques to visualize and discriminate clusters in two dimensions (2D) from the data obtained with ChemCam. We have used principal components analysis (PCA) and independent components analysis (ICA) for the linear tools and compared them with the nonlinear Sammon’s map projection technique. We demonstrate that the Sammon’s map gives the best 2D representation of the data set, with optimization values from 2.8% to 4.3% (0% is a perfect representation), together with an entropy value of 0.81 for the purity of the clustering analysis. The linear 2D projections result in three (ICA) and five times (PCA) more stress, and their clustering purity is more than twice higher with entropy values about 1.8. We show that the Sammon’s map algorithm is faster and gives a slightly better representation of the data set if the initial conditions are taken from the ICA projection rather than the PCA projection. We conclude that the nonlinear Sammon’s map projection is the best technique for combining data visualization and clustering assessment of the ChemCam LIBS data in 2D. PCA and ICA projections on more dimensions would improve on these numbers at the cost of the intuitive interpretation of the 2D projection by a human operator.  相似文献   

14.
Maximum likelihood principal component analysis (MLPCA) was originally proposed to incorporate measurement error variance information in principal component analysis (PCA) models. MLPCA can be used to fit PCA models in the presence of missing data, simply by assigning very large variances to the non‐measured values. An assessment of maximum likelihood missing data imputation is performed in this paper, analysing the algorithm of MLPCA and adapting several methods for PCA model building with missing data to its maximum likelihood version. In this way, known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR) methods are implemented within the MLPCA method to work as different imputation steps. Six data sets are analysed using several percentages of missing data, comparing the performance of the original algorithm, and its adapted regression‐based methods, with other state‐of‐the‐art methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
Classification problems have received considerable attention in biological and medical applications. In particular, classification methods combining to microarray technology play an important role in diagnosing and predicting disease, such as cancer, in medical research. Primary objective in classification is to build an optimal classifier based on the training sample in order to predict unknown class in the test sample. In this paper, we propose a unified approach for optimal gene classification with conjunction with functional principal component analysis (FPCA) in functional data analysis (FNDA) framework to classify time-course gene expression profiles based on information from the patterns. To derive an optimal classifier in FNDA, we also propose to find optimal number of bases in the smoothing step and functional principal components in FPCA using a cross-validation technique, and compare the performance of some popular classification techniques in the proposed setting. We illustrate the propose method with a simulation study and a real world data analysis.  相似文献   

16.
A phenomenological study of solubility has been conducted using a combination of quantitative structure-property relationship (QSPR) and principal component analysis (PCA). A solubility database of 4540 experimental data points was used that utilized available experimental data into a matrix of 154 solvents times 397 solutes. Methodology in which QSPR and PCA are combined was developed to predict the missing values and to fill the data matrix. PCA on the resulting filled matrix, where solutes are observations and solvents are variables, shows 92.55% of coverage with three principal components. The corresponding transposed matrix, in which solvents are observations and solutes are variables, showed 62.96% of coverage with four principal components.  相似文献   

17.
Discarding or downweighting high-noise variables in factor analytic models   总被引:1,自引:0,他引:1  
This work examines the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables). Such matrices often occur when measuring elemental concentrations in environmental samples. In the strongest variables, the error level may be a few percent. For the weakest variables, the data may consist almost entirely of noise. This paper demonstrates that the proper scaling of weak variables is critical. It is found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level. The mathematical explanation of this phenomenon is explored by means of Givens rotations. It is shown that the customary form of principal component analysis (PCA), based on autoscaling the original data, is generally very ineffective because the scaling of weak variables becomes much too high. Practical advice is given for dealing with noisy data in both PCA and positive matrix factorization (PMF).  相似文献   

18.
Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance/correlation matrix of the analyzed data. However, to properly work with high-dimensional data sets, PCA poses severe mathematical constraints on the minimum number of different replicates, or samples, that must be included in the analysis. Generally, improper sampling is due to a small number of data respect to the number of the degrees of freedom that characterize the ensemble. In the field of life sciences it is often important to have an algorithm that can accept poorly dimensioned data sets, including degenerated ones. Here a new random projection algorithm is proposed, in which a random symmetric matrix surrogates the covariance/correlation matrix of PCA, while maintaining the data clustering capacity. We demonstrate that what is important for clustering efficiency of PCA is not the exact form of the covariance/correlation matrix, but simply its symmetry.  相似文献   

19.
Envirometrics utilises advanced mathematical, statistical and information tools to extract information. Two typical environmental data sets are analysed using MVATOB (Multi Variate Analysis TOol Box). The first data set corresponds to the variable river salinity. Least median squares (LMS) detected the outliers whereas linear least squares (LLS) could not detect and remove the outliers. The second data set consists of daily readings of air quality values. Outliers are detected by LMS and unbiased regression coefficients are estimated by multi-linear regression (MLR). As explanatory variables are not independent, principal component regression (PCR) and partial least squares regression (PLSR) are used. Both examples demonstrate the superiority of LMS over LLS.  相似文献   

20.
基于小波变换平滑主成分分析   总被引:3,自引:0,他引:3  
小波变换具有很强的信号分离能力,很容易把随机噪音从信号中分离出来,从而提高信号的信噪比。本文把小波变换引入到因子分析中,提出了基于小波变换平滑主成分分析,该算法既保留普通主成分分析的正交分解,又具备了小波变换的信号分离能力。模拟数据和实验数据的结果表明,该算法具有从低信噪比的数据中提取出有用信息,并提高信号的信噪比。迭代目标变换因子分析处理实验数据的结果表明,基于小波变换平滑主成分分析的处理结果优  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号