首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于小波变换平滑主成分分析   总被引:3,自引:0,他引:3  
小波变换具有很强的信号分离能力,很容易把随机噪音从信号中分离出来,从而提高信号的信噪比。本文把小波变换引入到因子分析中,提出了基于小波变换平滑主成分分析,该算法既保留普通主成分分析的正交分解,又具备了小波变换的信号分离能力。模拟数据和实验数据的结果表明,该算法具有从低信噪比的数据中提取出有用信息,并提高信号的信噪比。迭代目标变换因子分析处理实验数据的结果表明,基于小波变换平滑主成分分析的处理结果优  相似文献   

2.
Multivariate methods, such as principal component analysis (PCA) and multivariate curve resolution (MCR), are often employed to aid the analysis of large complex data sets such as time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) images. There is, however, much confusion over the most appropriate choice of method for any given application and the effects of data preprocessing, which is exacerbated by the confusing terminologies and the use of jargon in this field. In the present study, a simple model system consisting of a ToF‐SIMS image of an immiscible polymer blend is used to evaluate PCA and MCR in the accurate identification, localisation and quantification of the phase‐separated polymer domains, using four data preprocessing methods (no scaling, normalisation, variance scaling and Poisson scaling). This highlights significant issues and challenges in the quantitative multivariate analysis of mixed organic systems, including the discrimination of chemically significant features from experimental noise, the resolution of weak chemical contributions and potential bias introduced by data preprocessing. Multivariate analysis using Poisson scaling, identified as the most suitable data preprocessing method for both PCA and MCR, demonstrates a marked improvement upon traditional (manual) analysis and provides valuable additional information that is difficult to detect using traditional analysis. Using these results, we present recommendations for the optimum use of multivariate analysis by analysts and provide guidance on selecting the most appropriate methods. Confusing terminology is also clarified. © Crown copyright 2008. Reproduced with the permission of Her Majesty's Stationery Office. Published by John Wiley & Sons, Ltd.  相似文献   

3.
Red ginseng has been gradually discovered to have pharmacological and physiological effects. It is well known that the most important bioactive components of ginseng are ginsenosides. The nootropic effect of ginsenosides from nine different red ginseng extracts was evaluated here. Nine groups of mice were perfused with different concentrations of nine red ginseng extracts, respectively, and two groups of mice with distilled water. The nootropic effect of ginsenosides on mice was evaluated with behavior tests and a biochemical indicator study. The extracts were identified by rapid resolution liquid chromatography coupled with quadrupole-time-of-flight mass spectrometry(RRLC-Q-TOF-MS). Furthermore, principal component analysis(PCA) was used to analyze the contribution of chemical components from different ginseng groups. The extracts with the most and the weakest effective nootropic were found. It is notable that extract processing is a very important factor to decide pharmacological functions of ginseng extracts. As a conclusion, the most effective extract method for ginsenosides has been found. A panel of 13 ginsenosides has been screened out as chemical markers with nootropic effect, which include high level ginsenosides Ra0, Rb1, Rc, Rb2, Rb3, Re, Rd, and Rg1 and low level ginsenosides mRb1, mRc, mRb2, mRd, and F2. Low level ginsenosides were first time to be discovered as possible nootropic compounds. This method may shed light on fast discovery of bioactive compounds of medicinal plants with low level compounds.  相似文献   

4.
Recent years have seen the introduction of many surface characterization instruments and other spectral imaging systems that are capable of generating data in truly prodigious quantities. The challenge faced by the analyst, then, is to extract the essential chemical information from this overwhelming volume of spectral data. Multivariate statistical techniques such as principal component analysis (PCA) and other forms of factor analysis promise to be among the most important and powerful tools for accomplishing this task. In order to benefit fully from multivariate methods, the nature of the noise specific to each measurement technique must be taken into account. For spectroscopic techniques that rely upon counting particles (photons, electrons, etc.), the observed noise is typically dominated by ‘counting statistics’ and is Poisson in nature. This implies that the absolute uncertainty in any given data point is not constant, rather, it increases with the number of counts represented by that point. Performing PCA, for instance, directly on the raw data leads to less than satisfactory results in such cases. This paper will present a simple method for weighting the data to account for Poisson noise. Using a simple time‐of‐flight secondary ion mass spectrometry spectrum image as an example, it will be demonstrated that PCA, when applied to the weighted data, leads to results that are more interpretable, provide greater noise rejection and are more robust than standard PCA. The weighting presented here is also shown to be an optimal approach to scaling data as a pretreatment prior to multivariate statistical analysis. Published in 2004 by John Wiley & Sons, Ltd.  相似文献   

5.
Wu X  Guo W  Cai W  Shao X  Pan Z 《Talanta》2003,61(6):863-869
An effective method for detection of weak analytical signals with strong noise background is proposed based on the theory of stochastic resonance (SR). Compared with the conventional SR-based algorithms, the proposed algorithm is simplified by changing only one parameter to realize the weak signal detection. Simulation studies revealed that the method performs well in detection of analytical signals in very high level of noise background and is suitable for detecting signals with the different noise level by changing the parameter. Applications of the method to experimental weak signals of X-ray diffraction and Raman spectrum are also investigated. It is found that reliable results can be obtained.  相似文献   

6.
Principal component analysis (PCA) and other multivariate analysis methods have been used increasingly to analyse and understand depth profiles in X‐ray photoelectron spectroscopy (XPS), Auger electron spectroscopy (AES) and secondary ion mass spectrometry (SIMS). These methods have proved equally useful in fundamental studies as in applied work where speed of interpretation is very valuable. Until now these methods have been difficult to apply to very large datasets such as spectra associated with 2D images or 3D depth‐profiles. Existing algorithms for computing PCA matrices have been either too slow or demanded more memory than is available on desktop PCs. This often forces analysts to ‘bin’ spectra on much more coarse a grid than they would like, perhaps even to unity mass bins even though much higher resolution is available, or select only part of an image for PCA analysis, even though PCA of the full data would be preferred. We apply the new ‘random vectors’ method of singular value decomposition proposed by Halko and co‐authors to time‐of‐flight (ToF)SIMS data for the first time. This increases the speed of calculation by a factor of several hundred, making PCA of these datasets practical on desktop PCs for the first time. For large images or 3D depth profiles we have implemented a version of this algorithm which minimises memory needs, so that even datasets too large to store in memory can be processed into PCA results on an ordinary PC with a few gigabytes of memory in a few hours. We present results from ToFSIMS imaging of a citrate crystal and a basalt rock sample, the largest of which is 134GB in file size corresponding to 67 111 mass values at each of 512 × 512 pixels. This was processed into 100 PCA components in six hours on a conventional Windows desktop PC. © 2015 The Authors. Surface and Interface Analysis published by John Wiley & Sons Ltd.  相似文献   

7.
It is common practice in chromatographic purity analysis of pharmaceutical manufacturing processes to assess the quality of peak integration combined by visual investigation of the chromatogram. This traditional method of visual chromatographic comparison is simple, but is very subjective, laborious and seldom very quantitative. For high-purity drugs it would be particularly difficult to detect the occurrence of an unknown impurity co-eluting with the target compound, which is present in excess compared to any impurity. We hypothesize that this can be achieved through Multivariate Statistical Process Control (MSPC) based on principal component analysis (PCA) modeling. In order to obtain the lowest detection limit, different chromatographic data preprocessing methods such as time alignment, baseline correction and scaling are applied. Historical high performance liquid chromatography (HPLC) chromatograms from a biopharmaceutical in-process analysis are used to build a normal operation condition (NOC) PCA model. Chromatograms added simulated 0.1% impurities with varied resolutions are exposed to the NOC model and monitored with MSPC charts. This study demonstrates that MSPC based on PCA applied on chromatographic purity analysis is a powerful tool for monitoring subtle changes in the chromatographic pattern, providing clear diagnostics of subtly deviating chromatograms. The procedure described in this study can be implemented and operated as the HPLC analysis runs according to the process analytical technology (PAT) concept aiming for real-time release.  相似文献   

8.
Comparatively few studies have explored the ability of Raman spectroscopy for the quantitative analysis of microbial secondary metabolites in fermentation broths. In this study we investigated the ability of Raman spectroscopy to differentiate between different penicillins and to quantify the level of penicillin in fermentation broths. However, the Raman signal is rather weak, therefore the Raman signal was enhanced using surface enhanced Raman spectroscopy (SERS) employing silver colloids. It was difficult by eye to differentiate between the five different penicillin molecules studied using Raman and SERS spectra, therefore the spectra were analysed by multivariate cluster analysis. Principal components analysis (PCA) clearly showed that SERS rather than the Raman spectra produced reproducible enough spectra to allow for the recovery of each of the different penicillins into their respective five groups. To highlight this further the first five principal components were used to construct a dendrogram using agglomerative clustering, and this again clearly showed that SERS can be used to identify which penicillin molecule was being analysed, despite their molecular similarities. With respect to the quantification of penicillin G it was shown that Raman spectroscopy could be used to quantify the amount of penicillin present in solution when relatively high levels of penicillin were analysed (>50 mM). By contrast, the SERS spectra showed reduced fluorescence, and improved signal to noise ratios from considerably lower concentrations of the antibiotic. This could prove to be advantageous in industry for monitoring low levels of penicillin in the early stages of antibiotic production. In addition, SERS may have advantages for quantifying low levels of high value, low yield, secondary metabolites in microbial processes.  相似文献   

9.
Advances in sensory systems have led to many industrial applications with large amounts of highly correlated data, particularly in chemical and pharmaceutical processes. With these correlated data sets, it becomes important to consider advanced modeling approaches built to deal with correlated inputs in order to understand the underlying sources of variability and how this variability will affect the final quality of the product. Additional to the correlated nature of the data sets, it is also common to find missing elements and noise in these data matrices. Latent variable regression methods such as partial least squares or projection to latent structures (PLS) have gained much attention in industry for their ability to handle ill‐conditioned matrices with missing elements. This feature of the PLS method is accomplished through the nonlinear iterative PLS (NIPALS) algorithm, with a simple modification to consider the missing data. Moreover, in expectation maximization PLS (EM‐PLS), imputed values are provided for missing data elements as initial estimates, conventional PLS is then applied to update these elements, and the process iterates to convergence. This study is the extension of previous work for principal component analysis (PCA), where we introduced nonlinear programming (NLP) as a means to estimate the parameters of the PCA model. Here, we focus on the parameters of a PLS model. As an alternative to modified NIPALS and EM‐PLS, this paper presents an efficient NLP‐based technique to find model parameters for PLS, where the desired properties of the parameters can be explicitly posed as constraints in the optimization problem of the proposed algorithm. We also present a number of simulation studies, where we compare effectiveness of the proposed algorithm with competing algorithms. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
在核磁共振代谢组学数据预处理中,尺度归一化主要目的是提高特征代谢物信息的权重,减小噪声及无关代谢物信息的影响,从而降低后续模式识别分析的难度. 本文提出一种新的尺度归一化方法,该方法不强调各变量在尺度上的归一,而是在原始数据的基础上,通过提高那些稳定性高、且在不同类别样本中具有显著差异性的变量的权重,以增强与特征代谢物相关的信息. 文中分别采用模拟数据和真实代谢组学数据对新归一化方法的性能进行评估,并与单位方差法(Unit Variance)、变量稳定性(Variable Stability)和尺度缩放法(Level Scaling)等常用的尺度归一化方法做比较. 研究结果表明:新归一化方法能够提高多变量统计模型的预测能力,较好地保留核磁共振谱的分子信息,有助于特征代谢物的识别,并使后续的数据分析结果具有更好的可解释性.  相似文献   

11.
Many complex natural or synthetic products are analysed either by the GC–MS (gas chromatography–mass spectrometry) or HPLC–DAD (high performance liquid chromatography–diode-array detector) technique, each of which produces a one-dimensional fingerprint for a given sample. This may be used for classification of different batches of a product. GC–MS and HPLC–DAD analyses of complex, similar substances represented by the three common types of the TCM (traditional Chinese medicine), Rhizoma Curcumae were analysed in the form of one- and two-dimensional matrices firstly with the use of PCA (Principal component analysis), which showed a reasonable separation of the samples for each technique. However, the separation patterns were rather different for each analytical method, and PCA of the combined data matrix showed improved discrimination of the three types of object; close associations between the GC–MS and HPLC–DAD variables were observed. LDA (linear discriminant analysis), BP-ANN (back propagation-artificial neural networks) and LS-SVM (least squares-support vector machine) chemometrics methods were then applied to classify the training and prediction sets. For one-dimensional matrices, all training models indicated that several samples would be misclassified; the same was observed for each prediction set. However, by comparison, in the analysis of the combined matrix, all models gave 100% classification with the training set, and the LS-SVM calibration also produced a 100% result for prediction, with the BP-ANN calibration closely behind. This has important implications for comparing complex substances such as the TCMs because clearly the one-dimensional data matrices alone produce inferior results for training and prediction as compared to the combined data matrix models. Thus, product samples may be misclassified with the use of the one-dimensional data because of insufficient information.  相似文献   

12.
Electrical impedance gives multivariate complex number data as results. Two examples of multivariate electrical impedance data measured on lipid monolayers in different solutions give rise to matrices (16x50 and 38x50) of complex numbers. Multivariate data analysis by principal component analysis (PCA) or singular value decomposition (SVD) can be used for complex data and the necessary equations are given. The scores and loadings obtained are vectors of complex numbers. It is shown that the complex number PCA and SVD are better at concentrating information in a few components than the na?ve juxtaposition method and that Argand diagrams can replace score and loading plots. Different concentrations of Magainin and Gramicidin A give different responses and also the role of the electrolyte medium can be studied. An interaction of Gramicidin A in the solution with the monolayer over time can be observed.  相似文献   

13.
An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).  相似文献   

14.
15.
Water quality data set from the alluvial region in the Gangetic plain in northern India, which is known for high fluoride levels in soil and groundwater, has been analysed by chemometric techniques, such as principal component analysis (PCA), discriminant analysis (DA) and partial least squares (PLS) in order to investigate the compositional differences between surface and groundwater samples, spatial variations in groundwater composition and influence of natural and anthropogenic factors. Trilinear plots of major ions showed that the groundwater in this region is mainly of Na/K-bicarbonate type. PCA performed on complete data matrix yielded six significant PCs explaining 65% of the data variance. Although, PCA rendered considerable data reduction, it could not clearly group and distinguish the sample types (dug well, hand-pump and surface water). However, a visible differentiation between the water samples pertaining to two watersheds (Khar and Loni) was obtained. DA identified six discriminating variables between surface and groundwater and also between different types of samples (dug well, hand pump and surface water). Distinct grouping of the surface and groundwater samples was achieved using the PLS technique. It further showed that the groundwater samples are dominated by variables having origin both in natural and anthropogenic sources in the region, whereas, variables of industrial origin dominate the surface water samples. It also suggested that the groundwater sources are contaminated with various industrial contaminants in the region.  相似文献   

16.
A method for the treatment of long-dimensional chemical data arrays is presented in this work with the aim of maximising classification models. The method is based on the construction of fingerprints and the subsequent generation of a similarity matrix. The similarity calculation has been modified through a scaling process to take into account different significance shown by the variables. The method was applied to spectral measurements of wines and several aspects were studied, namely: threshold considered in the construction of fingerprints and patterns, weighting factor used for scaling, normalisation method, etc. The application of both Principal Components Analysis and Soft-Independent Modelling of Class Analogies to the similarity matrices gave better classifications of the information than those obtained using original data.  相似文献   

17.
In environmental chemistry studies, it may be necessary to analyze data sets constituted by different blocks of variables, possibly of different types, measured on the same samples. Multiple factor analysis (MFA) is presented as a tool for exploring such data. The most important features of MFA are shown on a real environmental data set, consisting of two blocks of data, namely heavy metals and polycyclic aromatic hydrocarbons, measured for sediment samples. They are discussed and compared to principal component analysis (PCA). The usefulness of the weighting scheme used in MFA as a preprocessing step for other chemometric methods, such as clustering, is also highlighted.  相似文献   

18.
A method is proposed for the determination of chromatographic peak purity by means of principal component analysis (PCA) of high-performance liquid chromatography with diode array detection (HPLC-DAD) data. The method is exemplified with analysis of binary mixtures of lidocaine and prilocaine with different levels of separation. Lidocaine and prilocaine have very similar spectra and the chromatograms used had substantial peak overlap. The samples analysed contained a constant amount of lidocaine and a minor amount of prilocaine (0.02-2 conc.%) and hence the focus was on determining the purity of the lidocaine peak in the presence of much smaller levels of prilocaine. The peak purity determination was made by examination of relative observation residuals, scores and loadings from the PCA decomposition of DAD data over a chromatographic peak. As a reference method, the functions for peak purity analysis in the chromatographic data system used (Chromeleon) were applied. The PCA method showed good results at the same level as the detection limit of baseline-separated prilocaine, outperforming the methods in Chromeleon by a factor of ten. There is a discussion of the interpretation of the result, with some comparisons with evolving factor analysis (EFA). The main advantage of the PCA method for determination of peak purity over methods like EFA lies in its simplicity, short time of calculation and ease of use.  相似文献   

19.
Second-order global hard-modelling was applied to resolve the complex formation between Co2+, Ni2+, and Cd2+ cations and 1,10-phenantroline. The highly correlated spectral and concentration profiles of the species in these systems and low concentration of some species in the individual collected data matrices prevent the well-resolution of the profiles. Therefore, a collection of six equilibrium data matrices including series of absorption spectra taken with pH changes at different reactant ratios were analyzed. Firstly, a precise principle component analysis (PCA) of different augmented arrangements of the individual data matrices was used to distinguish the number of species involved in the equilibria. Based on the results of PCA, the equilibria included in the data were specified and second-order global hard-modelling of the appropriate arrangement of six collected equilibrium data matrices resulted in well-resolved profiles and equilibrium constants. The protonation constant of the ligand (1,10-phenantroline) and spectral profiles of its protonated and unprotonated forms are the additional information obtained by global analysis. For comparison, multivariate curve resolution-alternating least squares (MCR-ALS) was applied to the same data. The results showed that second-order global hard-modelling is more convenient compared with MCR-ALS especially for systems with completely known model. It can completely resolve the system and the concentration profiles which are closer to correct ones. Moreover, parameters showing the goodness of fit are better with second-order global hard-modelling.  相似文献   

20.
In the Tucker3 model of N-way principal components analysis (NPCA), a so-called core matrix describes the possible interactions between components from different modes. For an easy interpretation of solutions, it is necessary to have as few interactions as possible (in conventional PCA of data tables, such interactions can always be avoided). This goal may be realized by various approaches of core matrix transformations. At the same time, it is desirable to have simple component (or loading) matrices. Usually, the simplicity of the core conflicts to a certain degree with the simplicity of the components. The paper demonstrates how the conditional optimization of both goals can be used to find a compromise. For the purpose of illustration, the procedure is first applied to a small three-way data array from heavy metal analysis of tissues in different samples of game. Later, a data array of bigger size from a three-way interlaboratory study is considered. Received: 21 November 1997 / Revised: 11 February 1998 / Accepted: 15 February 1998  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号