首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
When X and Y are multivariate, the two-block partial least squares (PLS) method is often used. In this paper, we outline an extension addressing a special case of the three-block (X/Y/Z) problem, where Z sits "under" Y. We have called this approach three-block bi-focal PLS (3BIF-PLS). It views the X/Y relationship as the dominant problem, and seeks to use the additional information in Z in order to improve the interpretation of the Y-part of the X/Y association. Two data sets are used to illustrate 3BIF-PLS. Example I relates to single point mutants of haloalkane dehalogenase from Sphingomonas paucimobilis UT26 and their ability to transform halogenated hydrocarbons, some of which are found as organic pollutants in soil. Example II deals with soil remediation capability of bacteria. Whole bacterial communities are monitored over time using "DNA-fingerprinting" technology to see how pollution affects population composition. Since the data sets are large, hierarchical multivariate modelling is invoked to compress data prior to 3BIF-PLS analysis. It is concluded that the 3BIF-PLS approach works well. The paper contains a discussion of pros and cons of the method, and hints at further developmental opportunities.  相似文献   

2.
This article describes the applicability of multivariate projection techniques, such as principal-component analysis (PCA) and partial least-squares (PLS) projections to latent structures, to the large-volume high-density data structures obtained within genomics, proteomics, and metabonomics. PCA and PLS, and their extensions, derive their usefulness from their ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y. Three examples are used as illustrations: the first example is a genomics data set and involves modeling of microarray data of cell cycle-regulated genes in the microorganism Saccharomyces cerevisiae. The second example contains NMR-metabonomics data, measured on urine samples of male rats treated with either of the drugs chloroquine or amiodarone. The third and last data set describes sequence-function classification studies in a set of G-protein-coupled receptors using hierarchical PCA.  相似文献   

3.
《Analytical letters》2012,45(6):1227-1251
Abstract

In order to reduce data nonlinearity and overfitting with the multivariate calibration model y=Xb, a modified Tikhonov regularization (TR) algorithm is evaluated for selecting key variables from an X augmented with extra columns that contain the original measured variables (x ij ) as squared terms (x ij 2) and other orders. The TR approach simultaneously develops the multivariate calibration model. The new generalized pair‐correlation method (GPCM) is also studied for variable selection followed by partial least squares (PLS) for multivariate calibration. Results from synthetic spectral data are compared when using the modified TR approach, GPCM, and PLS without variable selection. The GPCM usually performs slightly better than the TR approach for tabulated bias and variance measures and in some cases, at a sacrifice to parsimony. The method of PLS without variable selection performs the worst. By using synthetic spectral data sets, how the methods work could be studied. Thus, results from this study will aid investigators of real spectral data sets exhibiting nonlinear behavior.  相似文献   

4.
A mechanism of catalytic reaction suggested allows self-oscillations when a step of diffusion exchange is added: Z1X1, X1Y1, 2X1+Y13Z1, X1+Z2Z1 +X2. For the corresponding kinetic model parametric portraits are built and regions with unique unstable steady states (st. s.) are revealed.  相似文献   

5.
A novel method for underdetermined regression problems, multicomponent self-organizing regression (MCSOR), has been recently introduced. Here, its performance is compared with partial least-squares (PLS), which is perhaps the most widely adopted multivariate method in chemometrics. A potpourri of models is presented, and MCSOR appears to provide highly predictive models that are comparable with or better than the corresponding PLS models in large internal (leave-one-out, LOO) and pseudo-external (leave-many-out, LMO) validation tests. The “blind” external predictive ability of MCSOR and PLS is demonstrated employing large melting point, factor Xa, log?P and log?S data sets. In a nutshell, MCSOR is fast, conceptually simple (employing multiple linear regression, MLR, as a statistical tool), and applicable to all kinds of multivariate problems with single Y-variable.  相似文献   

6.
The partial least-squares (PLS) algorithm has become popular for explorative multivariate data analysis and for multivariate calibration. The same PLS algorithm can also be used for confirmatory data analysis. The discussion is limited to analysis of a single response variable. A close correspondence of PLS1 regression to classical analysis of variance (ANOVA) is demonstrated. The design of an experiment is described in terms of discrete design variables for main effects and simple interactions (dummy variables). These are used as regressors X = (x1, x2,…,) for modelling the response variable of the experiment, y. As in conventional use of PLS1 regression, the algorithm gives a concentrated model or diagram of the most important, y-relevant variability types in the X-data. In the present case, this gives the combination of design variables that models the variations in y. A simple plot of the resulting factor loadings immediately reveals the important design variables. Statistical tests and confidence regions in the PLS solution give additional safeguards against interpretation of spurious effects. The method is applied to two data sets. One concerns assessment of personal preference for blackcurrent juice, studied in a 25 factorial experiment; these data are also studied with missing values and as fractional factorials. The other ceoncers spectrophotometric absorbance-based colour assessments of pigment in strawberry jam in a 3-factor design with 2, 2 and 3 levels in the respective factors.  相似文献   

7.
In spectroscopy the measured spectra are typically plotted as a function of the wavelength (or wavenumber), but analysed with multivariate data analysis techniques (multiple linear regression (MLR), principal components regression (PCR), partial least squares (PLS)) which consider the spectrum as a set of m different variables. From a physical point of view it could be more informative to describe the spectrum as a function rather than as a set of points, hereby taking into account the physical background of the spectrum, being a sum of absorption peaks for the different chemical components, where the absorbance at two wavelengths close to each other is highly correlated. In a first part of this contribution, a motivating example for this functional approach is given. In a second part, the potential of functional data analysis is discussed in the field of chemometrics and compared to the ubiquitous PLS regression technique using two practical data sets. It is shown that for spectral data, the use of B-splines proves to be an appealing basis to accurately describe the data. By applying both functional data analysis and PLS on the data sets the predictive ability of functional data analysis is found to be comparable to that of PLS. Moreover, many chemometric datasets have some specific structure (e.g. replicate measurements, on the same object or objects that are grouped), but the structure is often removed before analysis (e.g. by averaging the replicates). In the second part of this contribution, we suggest a method to adapt traditional analysis of variance (ANOVA) methods to datasets with spectroscopic data. In particular, the possibilities to explore and interpret sources of variation, such as variations in sample and ambient temperature, are examined. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
Equations relating the positional parameters of the anion in the oxide garnets to the mean constituent ionic radii of the cations occupying the {X}, [Y], and (Z) sites have been derived from published garnet structures using multiple regression analysis:
x = 0.0278(22)r {X} +0.0123(28)r [Y] ?0.0482(16)r(Z) +0.0141
y = ?0.0237(25)r {X} +0.0200(32)r [Y] + 0.0321(18)r(Z) +0.0523
z = ?0.0102(20)r {X} +0.0305(25)r [Y] ?0.0217(14)r(Z) +0.6519
Variations of mean bond lengths with constituent ionic radius are examined for the garnet structures. Deviations of mean bond length from the sum of the constituent ionic radii may be correlated with the ionic radius of the cations at the other sites in the structure.  相似文献   

9.
Zusammenfassung Feste 11-Komplexverbindungen von Germaniumsäure mit 1,2-Diaminopropan-N,N,N,N-tetraessigsäure und N-2-Hydroxyäthyl-1,2-diaminoäthan-N,N,N-triessigsäure werden dargestellt. Auf Grund von TG-, DTG-, DTA- sowie IR-spektroskopischen Untersuchungen wird ein Strukturvorschlag für die erhaltenen Verbindungen (GeY·2 H2O, GeY, GeZ·H2O, GeZ, GeX·H2O, GeX *** abgeleitet.In wäßriger Lösung erweist sich GeZ·H2O als einbasige Säure, H[Ge(OH)Z], mit einer DissoziationskonstanteK c =3,15·10–3 (pK c =2,50±0,03) für 25° C, Ionenstärke 0,1m.Es werden die Stabilitätskonstanten der Komplexverbindungen H[Ge(OH)Z], [Ge(OH)Z] und GeX·H2O in wäßriger Lösung bestimmt: (25° C, Ionenstärke 0,1 m).Mit 3 Abbildungen1. Mitt.:N. Konopik undP. Mészáros, Mh. Chem.99, 902 (1968).Auszugsweise vorgetragen auf der Jahrestagung des Vereines Österreichischer Chemiker am 5. Oktober 1967 in Wien. Vgl. Allgem. u. Prakt. Chemie18, 268 (1967).  相似文献   

10.
The internal rotation in the HC(?X)YCH2CH3 (X, Y = O or S) series of molecules was studied by the ab initio SCF-MO method using 3-21G and 3-21G + d(ζ = 0.65S) basis sets. Energies and structures of several conformations of these molecules, determined by gradient geometry refinement, are reported and used to assess the effects of oxygen-by-sulphur substitution on molecular properties. The nature and relative importance of intramolecular interactions involving both the ? CH2CH3 and the HC(?X)Y (X, Y = O or S) fragments are also discussed. © 1992 by John Wiley & Sons, Inc.  相似文献   

11.
Formal relations between similarity and docking are analyzed, and a general docking theory is proposed for colored mixtures of multivariate distributions. X and Y being two colored mixtures with given marginal distributions, their shape complementarity coefficient is defined as the lower bound of the variance of (XY)· (X-Y), taken over the set of joint distributions of X and Y. The docking is performed via minimization of the shape complementarity coefficient for all translations and rotations of the mixtures. The properties of the docking criterion are derived, and are shown to satisfy the practical requirements encountered in molecular shape analysis.  相似文献   

12.
In multivariate data analysis such as principal components analysis (PCA) and projections to latent structures (PLS), it is essential that the training set systems (objects) are selected to provide data with substantial information for model parametrization, and to represent properly any future situations where the multilvariate model is used for predictions. In the framework of multivariate projections (PCA, SIMCA and PLS), elementary concepts of statistical design (fractional factorials and composite designs) can be used with the latent variables (PC or PLS scores) as design variables. The plan of action thus becomes: (1) problem formulation (specify aim and model, make a conceptual division of the investigated system into subsystems); (2) collection of multivariate data for each type of subsystems; (3) estimation of the practical dimensionality of the data for each type of subsystems by PC or PLS analysis; (4) use of the PC or PLS scores (t) as design variables in the combination of subsystems to systems in the training set; (5) measurement of responses (Y); (6) analysis of data by PCA or PLS; (7) interpretation of results with possible feedback to steps 1, 2 or 3. The procedures are illustrated by two problems: a structure/activity relationship for a family of peptides, and optimization of an organic synthesis with respect to system variables (solvent, substrate, co-reactant_) and process variables (temperature, reactant concentrations).  相似文献   

13.
14.
Partial Least Squares (PLS) is a wide class of regression methods aiming at modelling relationships between sets of observed variables by means of latent variables. Specifically, PLS2 was developed to correlate two blocks of data, the X‐block representing the independent or explanatory variables and the Y‐block representing the dependent or response variables. Lately, OPLS was introduced to further reduce model complexity by removing Y‐orthogonal sources of variation from X in the latent space, thus improving data interpretation through the generated predictive latent variables. Nevertheless, relationships between PLS2 and OPLS in case of multiple Y‐response have not yet been fully explored. With this perspective and taking inspiration from some basic mathematical properties of PLS2, we here present a novel and general approach consisting in a post‐transformation of PLS2 (ptPLS2), which results in a decomposition of the latent space into orthogonal and predictive components, while preserving the same goodness of fit and predictive ability of PLS2. Additionally, we discuss the application of ptPLS2 approach to two metabolomic data sets extracted from earlier published studies and its advantages in model interpretation as compared with the ‘standard’ PLS approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
The well‐known Martens factorization for PLS1 produces a single y‐related score, with all subsequent scores being y‐unrelated. The X‐explanatory value of these y‐orthogonal scores can be summarized by a simple expression, which is analogous to the ‘P’ loading weights in the orthogonalized NIPALS algorithm. This can be used to rearrange the factorization into entirely y‐related and y‐unrelated parts. Systematic y‐unrelated variation can thus be removed from the X data through a single post hoc calculation following conventional PLS, without any recourse to the orthogonal projections to latent structures (OPLS) algorithm. The work presented is consistent with the development by Ergon (PLS post‐processing by similarity transformation (PLS + ST): a simple alternative to OPLS. J. Chemom. 2005; 19 : 1–4), which shows that conventional PLS and OPLS are equivalent within a similarity transform. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

16.
Nonadditive effects of substituents X and Y are manifested in reactions of symmetrically X-substituted trans-2,3-diaryloxiranes with Y-substituted arenesulfonic acids. The isoparametric point is reached experimentally: close to the point τ X IP = 4.73 on substituent Х the rate of the oxirane ring opening of trans-2,3-bis(3-bromo-5-nitrophenyl)oxirane (τX = 4.38) practically does not depend on substituent Y (ρ Y X = 0.10±0.05). The results of cross correlation analysis of the kinetic data are applied to interpretation of the mechanism of the studied reactions.  相似文献   

17.
Research has been carried out to determine the potential of partial least squares (PLS) modeling of mid-infrared (IR) spectra of crude oils combined with the corresponding 1H and 13C nuclear magnetic resonance (NMR) data, to predict the long residue (LR) properties of these substances. The study elaborates further on a recently developed and patented method to predict this type of information from only IR spectra. In the present study, PLS modeling was carried out for 7 different LR properties, i.e., yield long-on-crude (YLC), density (DLR), viscosity (VLR), sulfur content (S), pour point (PP), asphaltenes (Asph) and carbon residue (CR). Research was based on the spectra of 48 crude oil samples of which 28 were used to build the PLS models and the remaining 20 for validation. For each property, PLS modeling was carried out on single type IR, 13C NMR and 1H NMR spectra and on 3 sets of merged spectra, i.e., IR + 1H NMR, IR + 13C NMR and IR + 1H NMR + 13C NMR. The merged spectra were created by considering the NMR data as a scaled extension of the IR spectral region. In addition, PLS modeling of coupled spectra was performed after a Principal Component Analysis (PCA) of the IR, 13C NMR and 1H NMR calibration sets. For these models, the 10 most relevant PCA scores of each set were concatenated and scaled prior to PLS modeling. The validation results of the individual IR models, expressed as root-mean-square-error-of-prediction (RMSEP) values, turned out to be slightly better than those obtained for the models using single input 13C NMR or 1H NMR data. For the models based on IR spectra combined with NMR data, a significant improvement of the RMSEP values was not observed neither for the models based on merged spectra nor for those based on the PCA scores. It implies, that the commonly accepted complementary character of NMR and IR is, at least for the crude oil and bitumen samples under study, not reflected in the results of PLS modeling. Regarding these results, the absence of sample preparation and the straightforward way of data acquisition, IR spectroscopy is preferred over NMR for the prediction of LR properties of crude oils at site.  相似文献   

18.
Bio-pharmaceutical manufacturing is a multifaceted and complex process wherein the manufacture of a single batch hundreds of processing variables and raw materials are monitored. In these processes, identifying the candidate variables responsible for any changes in process performance can prove to be extremely challenging. Within this context, partial least squares (PLS) has proven to be an important tool in helping determine the root cause for changes in biological performance, such as cellular growth or viral propagation. In spite of the positive impact PLS has had in helping understand bio-pharmaceutical process data, the high variability in measured response (Y) and predictor variables (X), and weak relationship between X and Y, has at times made root cause determination for process changes difficult. Our goal is to demonstrate how the use of bootstrapping, in conjunction with permutation tests, can provide avenues for improving the selection of variables responsible for manufacturing process changes via the variable importance in the projection (PLS-VIP) statistic. Although applied uniquely to the PLS-VIP in this article, the generality of the aforementioned methods can be used to improve other variable selection methods, in addition to increasing confidence around other estimates obtained from a PLS model.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号