首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In multivariate data analysis such as principal components analysis (PCA) and projections to latent structures (PLS), it is essential that the training set systems (objects) are selected to provide data with substantial information for model parametrization, and to represent properly any future situations where the multilvariate model is used for predictions. In the framework of multivariate projections (PCA, SIMCA and PLS), elementary concepts of statistical design (fractional factorials and composite designs) can be used with the latent variables (PC or PLS scores) as design variables. The plan of action thus becomes: (1) problem formulation (specify aim and model, make a conceptual division of the investigated system into subsystems); (2) collection of multivariate data for each type of subsystems; (3) estimation of the practical dimensionality of the data for each type of subsystems by PC or PLS analysis; (4) use of the PC or PLS scores (t) as design variables in the combination of subsystems to systems in the training set; (5) measurement of responses (Y); (6) analysis of data by PCA or PLS; (7) interpretation of results with possible feedback to steps 1, 2 or 3. The procedures are illustrated by two problems: a structure/activity relationship for a family of peptides, and optimization of an organic synthesis with respect to system variables (solvent, substrate, co-reactant_) and process variables (temperature, reactant concentrations).  相似文献   

2.
Multivariate data analysis methods (Principal Component Analysis (PCA) and Partial Least Squares (PLS)) are applied to the analysis of the CoMFA (Comparative Molecular Field Analysis) data for several nucleic acids components. The data set includes nitrogenated bases, nucleosides, linear nucleotides, 3, 5-cyclic nucleotides and oligonucleotides. PCA is applied to study the structure of the CoMFA data and to detect possible outliers in the data set. PLS is applied to correlate the CoMFA data with either calculated AM1 proton affinities or with experimental pKa values. The possibility of making a prediction of pKa values directly from 3D structures of the monomers for polynucleotides is also shown. The influence of the superposition criteria and of conformational changes along the glycosidic bond on the pKa prediction are studied as well.  相似文献   

3.
Research has been carried out to determine the potential of partial least squares (PLS) modeling of mid-infrared (IR) spectra of crude oils combined with the corresponding 1H and 13C nuclear magnetic resonance (NMR) data, to predict the long residue (LR) properties of these substances. The study elaborates further on a recently developed and patented method to predict this type of information from only IR spectra. In the present study, PLS modeling was carried out for 7 different LR properties, i.e., yield long-on-crude (YLC), density (DLR), viscosity (VLR), sulfur content (S), pour point (PP), asphaltenes (Asph) and carbon residue (CR). Research was based on the spectra of 48 crude oil samples of which 28 were used to build the PLS models and the remaining 20 for validation. For each property, PLS modeling was carried out on single type IR, 13C NMR and 1H NMR spectra and on 3 sets of merged spectra, i.e., IR + 1H NMR, IR + 13C NMR and IR + 1H NMR + 13C NMR. The merged spectra were created by considering the NMR data as a scaled extension of the IR spectral region. In addition, PLS modeling of coupled spectra was performed after a Principal Component Analysis (PCA) of the IR, 13C NMR and 1H NMR calibration sets. For these models, the 10 most relevant PCA scores of each set were concatenated and scaled prior to PLS modeling. The validation results of the individual IR models, expressed as root-mean-square-error-of-prediction (RMSEP) values, turned out to be slightly better than those obtained for the models using single input 13C NMR or 1H NMR data. For the models based on IR spectra combined with NMR data, a significant improvement of the RMSEP values was not observed neither for the models based on merged spectra nor for those based on the PCA scores. It implies, that the commonly accepted complementary character of NMR and IR is, at least for the crude oil and bitumen samples under study, not reflected in the results of PLS modeling. Regarding these results, the absence of sample preparation and the straightforward way of data acquisition, IR spectroscopy is preferred over NMR for the prediction of LR properties of crude oils at site.  相似文献   

4.
5.
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single‐response case. The present paper presents an approach to identification of the complete y ‐orthogonal structure by starting from the viewpoint of standard PLS1 regression. Three alternative non‐deflating OPLS algorithms and a modified principal component analysis (PCA)‐driven method (including MATLAB code) is presented. The first algorithm implements a postprocessing routine of the standard PLS1 solution where QR factorization applied to a shifted version of the non‐orthogonal scores is the key to express the OPLS solution. The second algorithm finds the OPLS model directly by an iterative procedure. By a rigorous mathematical argument, we explain that orthogonal filtering is a ‘built‐in’ property of the traditional PLS1 regression coefficients. Consequently, the capabilities of OPLS with respect to improving the predictions (also for new samples) compared with PLS1 are non‐existing. The PCA‐driven method is based on the fact that truncating off one dimension from the row subspace of X results in a matrix X orth with y ‐orthogonal columns and a rank of one less than the rank of X . The desired truncation corresponds exactly to the first X deflation step of Martens non‐orthogonal PLS algorithm. The significant y ‐orthogonal structure of X found by PCA of X orth is split into two fundamental parts: one part that is significantly contributing to correct the first PLS score toward y and one part that is not. The third and final OPLS algorithm presented is a modification of Martens non‐orthogonal algorithm into an efficient dual PLS1–OPLS algorithm. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

6.
7.
8.
In spectroscopy the measured spectra are typically plotted as a function of the wavelength (or wavenumber), but analysed with multivariate data analysis techniques (multiple linear regression (MLR), principal components regression (PCR), partial least squares (PLS)) which consider the spectrum as a set of m different variables. From a physical point of view it could be more informative to describe the spectrum as a function rather than as a set of points, hereby taking into account the physical background of the spectrum, being a sum of absorption peaks for the different chemical components, where the absorbance at two wavelengths close to each other is highly correlated. In a first part of this contribution, a motivating example for this functional approach is given. In a second part, the potential of functional data analysis is discussed in the field of chemometrics and compared to the ubiquitous PLS regression technique using two practical data sets. It is shown that for spectral data, the use of B-splines proves to be an appealing basis to accurately describe the data. By applying both functional data analysis and PLS on the data sets the predictive ability of functional data analysis is found to be comparable to that of PLS. Moreover, many chemometric datasets have some specific structure (e.g. replicate measurements, on the same object or objects that are grouped), but the structure is often removed before analysis (e.g. by averaging the replicates). In the second part of this contribution, we suggest a method to adapt traditional analysis of variance (ANOVA) methods to datasets with spectroscopic data. In particular, the possibilities to explore and interpret sources of variation, such as variations in sample and ambient temperature, are examined. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

9.
《Vibrational Spectroscopy》2010,52(2):205-212
Research has been carried out to determine the potential of partial least squares (PLS) modeling of mid-infrared (IR) spectra of crude oils combined with the corresponding 1H and 13C nuclear magnetic resonance (NMR) data, to predict the long residue (LR) properties of these substances. The study elaborates further on a recently developed and patented method to predict this type of information from only IR spectra. In the present study, PLS modeling was carried out for 7 different LR properties, i.e., yield long-on-crude (YLC), density (DLR), viscosity (VLR), sulfur content (S), pour point (PP), asphaltenes (Asph) and carbon residue (CR). Research was based on the spectra of 48 crude oil samples of which 28 were used to build the PLS models and the remaining 20 for validation. For each property, PLS modeling was carried out on single type IR, 13C NMR and 1H NMR spectra and on 3 sets of merged spectra, i.e., IR + 1H NMR, IR + 13C NMR and IR + 1H NMR + 13C NMR. The merged spectra were created by considering the NMR data as a scaled extension of the IR spectral region. In addition, PLS modeling of coupled spectra was performed after a Principal Component Analysis (PCA) of the IR, 13C NMR and 1H NMR calibration sets. For these models, the 10 most relevant PCA scores of each set were concatenated and scaled prior to PLS modeling. The validation results of the individual IR models, expressed as root-mean-square-error-of-prediction (RMSEP) values, turned out to be slightly better than those obtained for the models using single input 13C NMR or 1H NMR data. For the models based on IR spectra combined with NMR data, a significant improvement of the RMSEP values was not observed neither for the models based on merged spectra nor for those based on the PCA scores. It implies, that the commonly accepted complementary character of NMR and IR is, at least for the crude oil and bitumen samples under study, not reflected in the results of PLS modeling. Regarding these results, the absence of sample preparation and the straightforward way of data acquisition, IR spectroscopy is preferred over NMR for the prediction of LR properties of crude oils at site.  相似文献   

10.
The quantitative structure-activity relationship of a set of 19 flavonoid compounds presenting antioxidant activity was studied by means of PLS (Partial Least Squares) regression. The optimization of the structures and calculation of electronic properties were done by using the semiempirical method AM1. A reliable model (r 2=0.806 and q 2=0.730) was obtained and from this model it was possible to consider some aspects of the structure of the flavonoid compounds studied that are related with their free radical scavenging ability. The quality of the PLS model obtained in this work indicates that it can be used in order to design new flavonoid compounds that present ability to scavenge free radicals.  相似文献   

11.
12.
《Analytical letters》2012,45(1):122-139
Abstract

Species of Garcinia (Guttiferae) are used for flavoring curries, as a supplement, and to treat various diseases. This study describes the comparison and discrimination of Garcinia cambogia, Garcinia indica, Garcinia mangostana and Garcinia atroviridis fruits by analyzing their major phytochemicals, elemental content, antioxidant, antidiabetic, and anticholinesterase enzymes activities. For phytochemical and elemental profiling, ultraviolet (UV), near infrared/infrared (NIR/IR), inductively coupled plasma-optical emission spectroscopy (ICP-OES) and ICP-mass spectrometric (ICP-MS) techniques were used. The chemometric multivariate tests of linear discriminant and principal component analyses (LDA, PCA) were used to discriminate the subject fruit samples. Spectroscopic data showed resonances of phenolics and flavonoidal constituents present in the fruits. G. mangostana exhibited the highest phenolics (721.6 to 2815.3?µM GAE/g), whereas G. cambogia was rich in flavonoids (51.9 to 2709.2?µM QE/g). Anthocyanin (cyanidin-3-O-glucoside) evaluated by high performance liquid chromatographic was 9.01?mg/kg in G. mangostana fruit. In the analyzed fruits, Ca, K and Na were high, trace essential elements were at appreciable contents, whereas the toxic elements As, Cd, Tl, and Pb were within the safe limits. G. mangostana contained potent free radicals and cholinesterase enzyme inhibitors, whereas G. cambogia inhibited α-amylase enzyme more significantly. PCA and LDA discriminated the fruit samples with distinct classification and variability indices. The analyzed fruits were shown to be good sources of free radicals, cholinesterase, and α-amylase enzymes inhibition, mineral and essential elements, and safe for human consumption.  相似文献   

13.
The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).  相似文献   

14.
15.
Water quality data set from the alluvial region in the Gangetic plain in northern India, which is known for high fluoride levels in soil and groundwater, has been analysed by chemometric techniques, such as principal component analysis (PCA), discriminant analysis (DA) and partial least squares (PLS) in order to investigate the compositional differences between surface and groundwater samples, spatial variations in groundwater composition and influence of natural and anthropogenic factors. Trilinear plots of major ions showed that the groundwater in this region is mainly of Na/K-bicarbonate type. PCA performed on complete data matrix yielded six significant PCs explaining 65% of the data variance. Although, PCA rendered considerable data reduction, it could not clearly group and distinguish the sample types (dug well, hand-pump and surface water). However, a visible differentiation between the water samples pertaining to two watersheds (Khar and Loni) was obtained. DA identified six discriminating variables between surface and groundwater and also between different types of samples (dug well, hand pump and surface water). Distinct grouping of the surface and groundwater samples was achieved using the PLS technique. It further showed that the groundwater samples are dominated by variables having origin both in natural and anthropogenic sources in the region, whereas, variables of industrial origin dominate the surface water samples. It also suggested that the groundwater sources are contaminated with various industrial contaminants in the region.  相似文献   

16.
In this work, multivariable calibration models based on middle- and near-infrared spectroscopy were developed in order to determine the content of biodiesel in diesel fuel blends, considering the presence of raw vegetable oil. Soybean, castor and used frying oils and their corresponding esters were used to prepare the blends with conventional diesel. Results indicated that partial least squares (PLS) models based on MID or NIR infrared spectra were proven suitable as practical analytical methods for predicting biodiesel content in conventional diesel blends in the volume fraction range from 0% to 5%. PLS models were validated by independent prediction set and the RMSEPs were estimated as 0.25 and 0.18 (%, v/v). Linear correlations were observed for predicted vs. observed values plots with correlation coefficient (R) of 0.986 and 0.994 for the MID and NIR models, respectively. Additionally, principal component analysis (PCA) in the MID region 1700 to 1800 cm− 1 was suitable for identifying raw vegetable oil contaminations and illegal blends of petrodiesel containing the raw vegetable oil instead of ester.  相似文献   

17.
Advances in sensory systems have led to many industrial applications with large amounts of highly correlated data, particularly in chemical and pharmaceutical processes. With these correlated data sets, it becomes important to consider advanced modeling approaches built to deal with correlated inputs in order to understand the underlying sources of variability and how this variability will affect the final quality of the product. Additional to the correlated nature of the data sets, it is also common to find missing elements and noise in these data matrices. Latent variable regression methods such as partial least squares or projection to latent structures (PLS) have gained much attention in industry for their ability to handle ill‐conditioned matrices with missing elements. This feature of the PLS method is accomplished through the nonlinear iterative PLS (NIPALS) algorithm, with a simple modification to consider the missing data. Moreover, in expectation maximization PLS (EM‐PLS), imputed values are provided for missing data elements as initial estimates, conventional PLS is then applied to update these elements, and the process iterates to convergence. This study is the extension of previous work for principal component analysis (PCA), where we introduced nonlinear programming (NLP) as a means to estimate the parameters of the PCA model. Here, we focus on the parameters of a PLS model. As an alternative to modified NIPALS and EM‐PLS, this paper presents an efficient NLP‐based technique to find model parameters for PLS, where the desired properties of the parameters can be explicitly posed as constraints in the optimization problem of the proposed algorithm. We also present a number of simulation studies, where we compare effectiveness of the proposed algorithm with competing algorithms. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

18.
This paper focuses on the application of principal component analysis (PCA) to facilitate the optimization of the derivatization of oestrogenic steroids—estrone, 17β‐estradiol, estriol, 17α‐ethinylestradiol and diethylstilbestrol—in order to achieve (1) the complete derivatization of all the hydroxyl groups contained in the structure of the compounds and (2) the greatest effectiveness of this reaction. Six different derivatization reagents were used in this study, whereas 2‐methyl‐anthracene was applied as the internal standard to evaluate the effectiveness of the reactions. The experimental data were subjected to PCA. With PCA, the dimensionality of the original multivariable data set could be reduced and the selection of optimum conditions for derivatization facilitated. The mixture of 99% N,O‐bis(trimethylsilyl)trifluoroacetamide + 1% trimethylchlorosilane and pyridine (1:1, v/v) at 60 °C for 30 min has been established as the most convenient and efficient means of derivatizing the aforementioned oestrogenic steroids and diethylstilbestrol; the N‐methyl‐N‐(trimethylsilyl)trifluoroacetamide + pyridine (1:1, v/v) mixture seems to be a promising alternative. The application of PCA for optimizing the derivatization procedure, proposed for the first time in this study, is particularly useful in the development of multicomponent methods across several chemical classes of compounds. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

19.
This work explores a novel method for rearranging 1st order (one-way) infra-red (IR) and/or near infra-red (NIR) ordinary spectra into a representation suitable for multi-way modelling and analysis. The method is based on the fact that the fundamental IR absorption and the first, second, and consecutive overtones of NIR absorptions represent identical chemical information. It is therefore possible to rearrange these overtone regions of the vectors comprising an IR and NIR spectrum into a matrix where the fundamental, 1st, 2nd, and consecutive overtones of the spectrum are arranged as either rows or columns in a matrix, resulting in a true three-way tensor of data for several samples. This tensorization facilitates explorative analysis and modelling with multi-way methods, for example parallel factor analysis (PARAFAC), N-way partial least squares (N-PLS), and Tucker models. The vibrational overtone combination spectroscopy (VOCSY) arrangement is shown to benefit from the “order advantage”, producing more robust, stable, and interpretable models than, for example, the traditional PLS modelling method. The proposed method also opens the field of NIR for true peak decomposition—a feature unique to the method because the latent factors acquired using PARAFAC can represent pure spectral components whereas latent factors in principal component analysis (PCA) and PLS usually do not.  相似文献   

20.
The aim of this paper is to characterize metabolism disorders in Kunming mice induced by S180 and H22 tumor cells. Metabolic fingerprint based on high performance liquid chromatography‐diode array detector (HPLC‐DAD) was developed to map the disturbed metabolic responses. In vivo testing of the antitumor activity of paclitaxel (Taxol) was carried out by inhibiting the growth of S180 and H22 tumor cells. Based on 27 common peaks, principal component analysis (PCA) and partial least squares‐discriminant analysis (PLS‐DA) were used to distinguish the abnormal from control and to find significant endogenous compounds (SECs) which have significant contributions to classification. The tumor growth inhibition ratios (TIRs) of Taxol groups were used to validate the predictive accuracies of the PLS‐DA models. The predictive accuracies of PLS‐DA models for S180 and H22 tumor model groups were 97.6 and 100%, respectively. Nine (S180) and seven (H22) SECs were discovered, including uric acid and cytidine. In addition, the correlations between relative tumor weights (RTWs) and chromatographic data for the SECs were significant (p < 0.05). Investigations on the stability and precision of the established metabolic fingerprints demonstrate that the experiment is well controlled and reliable. This work shows that the platform of HPLC‐DAD coupled with chemometric methods provides a promising method for the study of metabolism disorders induced by tumor cells. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号