首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new method has been developed for prediction of homology model quality directly from the sequence alignment, using multivariate regression. Hence, the expected quality of future homology models can be estimated using only information about the primary structure. This method has been applied to protein kinases and can easily be extended to other protein families. Homology model quality for a reference set of homology models was verified by comparison to experimental structures, by calculation of root-mean-square deviations (RMSDs) and comparison of interresidue contact areas. The homology model quality measures were then used as dependent variables in a Partial Least Squares (PLS) regression, using a matrix of alignment score profiles found from the Point Accepted Mutation (PAM) 250 similarity matrix as independent variables. This resulted in a regression model that can be used to predict the accuracy of future homology models from the sequence alignment. Using this method, one can identify the target-template combinations that are most likely to give homology models of sufficient quality. Hence, this method can be used to effectively choose the optimal templates to use for the homology modeling. The method's ability to guide the choice of homology modeling templates was verified by comparison of success rates to those obtained using BLAST scores and target-template sequence identities, respectively. The results indicate that the method presented here performs best in choosing the optimal homology modeling templates. Using this method, the optimal template was chosen in 86% of the cases, as compared to 62% using BLAST scores, and 57% using sequence identities. The method presented here can also be used to identify regions of the protein structure that are difficult to model, as well as alignment errors. Hence, this method is a useful tool for ensuring that the best possible homology model is generated.  相似文献   

2.
Near-infrared (NIR) spectrometry will present a more promising tool for quantitative measurement if the robustness and predictive ability of the partial least square (PLS) model are improved. In order to achieve the purpose, we present a new algorithm for simultaneous wavelength selection and outlier detection; at the same time, the problems of background and noise in multivariate calibration are also solved. The strategy is a combination of continuous wavelet transform (CWT) and modified iterative predictors and objects weighting PLS (mIPOW-PLS). CWT is performed as a pretreatment tool for eliminating background and noise synchronously; then, mIPOW-PLS is proposed to remove both the useless wavelengths and the multiple outliers in CWT domain. After pretreatment with CWT-mIPOW-PLS, a PLS model is built finally for prediction. The results indicate that the combination of CWT and mIPOW-PLS produces robust and parsimonious regression models with very few wavelengths.  相似文献   

3.
The estimation of the prediction region of partial least squares (PLS) is necessary in many engineering applications. However, research in this area focuses on the estimation of prediction intervals only. In this work, a new recursive formulation of PLS is proposed to facilitate the calculation of the Jacobian matrix of the estimated coefficient matrix. Furthermore, the computational complexity analysis indicates that the proposed algorithm is O(m2N + mpN + mpN2 + mN3 + mpN4) per number of component. The prediction region of the multivariate PLS is obtained through local linearization. The new formulation provides one way to obtain the prediction region of the multivariate PLS. Simulation and near‐infrared spectra of corn case studies indicate the utility of the proposed method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

4.
This work shows that independent component analysis (ICA) can be used to obtain statistically independent and, therefore, chemically interpretable latent variables (LVs) in multivariate regression. Two novel algorithms based on ICA are introduced and compared with two classical methods on simulated data: principal component regression and partial least-squares regression. All methods compared yield accurate predictions, but only those based on ICA yield LVs that are chemically interpretable. Practical limitations of ICA-based regression with respect to the underlying assumptions, sample size, and measurement noise are discussed and illustrated by means of simulations.  相似文献   

5.
In multivariate spectral calibration by principal component regression (PCR), the principal components (PCs) are calculated from the response data measured at all employed instrument channels; however some channels are redundant and their responses do not possess useful information. Thus, the extracted PCs possess mixed information from both useful and redundant channels. In this work, we propose a segmentation approach based on unsupervised pattern recognition to identify the most informative spectral region and then to construct a stable multivariate calibration model by PCR. In this method, the instrument channels are clustered into different segments via Kohonen self‐organization map. The spectral data of each segment are then subjected to PCA and the derived PCs are used as input variables for an inverse least square (ILS) regression model employing stepwise selection of the informative PCs. The proposed method was evaluated by the analysis of four simulated and six experimental data sets. It was found that our proposed method can model the above data sets with prediction errors lower than conventional partial least squares (PLS) and PCR methods. In addition, the prediction ability of our method was better than the previously reported models for these data sets. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

6.
The aim of this study was to develop an empirical model that provides accurate predictions of the biochemical oxygen demand of the output stream from the aerated lagoon at International Paper of Brazil, one of the major pulp and paper plants in Brazil. Predictive models were calculated from functional link neural networks (FLNNs), multiple linear regression, principal components regression, and partial least-squares regression (PLSR). Improvement in FLNN modeling capability was observed when the data were preprocessed using the PLSR technique. PLSR also proved to be a powerful linear regression technique for this problem, which presents operational data limitations.  相似文献   

7.
In order to select chromatographic starting conditions to be optimized during further method development of the separation of a given mixture, so-called generic orthogonal chromatographic systems could be explored in parallel. In this paper the use of univariate and multivariate regression trees (MRT) was studied to define the most orthogonal subset from a given set of chromatographic systems. Two data sets were considered, which contain the retention data of 68 structurally diversive drugs on sets of 32 and 38 chromatographic systems, respectively. For both the univariate and multivariate approaches no other data but the measured retention factors are needed to build the decision trees. Since multivariate regression trees are used in an unsupervised way, they are called auto-associative multivariate regression trees (AAMRT). For all decision trees used, a variable importance list of the predictor variables can be derived. It was concluded that based on these ranked lists, both for univariate and multivariate regression trees, a selection of the most orthogonal systems from a given set of systems can be obtained in a user-friendly and fast way.  相似文献   

8.
通过量子化学方法计算了 1 4个新型嘧啶硫代水杨酸衍生物的量化参数 ,运用分子力学进行完全构象优化进而计算了分子几何形状等参数。对这些化合物抑制乙酰乳酸合成酶 (ALS)活性进行了QSAR分析 ,结果表明 ,空间因素及静电效应是影响构效关系的主要因素 ,神经网络法得到较优预测结果。  相似文献   

9.
10.
A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

11.
This paper presents a multivariate regression method for simultaneous detection of sugar (sucrose as a sugar equivalent) and ethanol concentrations in aqueous solutions via temperature‐dependent ultrasonic velocity. Thus, several samples of different combined concentration values were exposed to a temperature spectrum ranging from 2 to 30°C to investigate the temperature dependence of ultrasonic velocity. Model calibration was performed in order to predict the concentrations of interest. With results of proceeded experiments, the equations for calculation of unknown concentrations were carried out using polynomial regression revealing two equations with functional dependence of concentrations on each other. Further, side effects or systematic errors are still included in this model. To avoid such problems as well as to increase the accuracy with respect to the absolute errors in determining unknown probes, multivariate regression methods such as partial least squares (PLS) were tested and compared to the results obtained by polynomial regression. The accuracy achieved with chemometric models on average was three times higher. In direct comparison, the values of the error for the prediction of sucrose concentration were on average around 0.4 g/100 g in the regression model with polynomial background (RMPA) and around 0.12 g/100 g in the PLS model, and for ethanol concentration 0.13 and 0.04 g/100 g, respectively. Furthermore, calculations of the concentrations are possible without knowing the concentrations of the other solute. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

12.
A reverse-phase high-performance liquid chromatographic (HPLC) method to determine hydrocortisone acetate, hydrocortisone hemisuccinate and lidocaine is described in this paper. The separation was made in a LichrCART C(18) column using a methanol-NaH(2)PO(4)/Na(2)HPO(4) (0.1 mol L(-1)) (pH=4.5) buffer solution as a mobile phase in isocratic mode (60:40 (v/v)). The mobile phase flow rate and the sample volume injected were 1 mL min(-1) and 20 micro L, respectively. The detection was made with a diode-array detector measuring at the maximum for each compound. Quantification limits ranging from 0.18 to 0.84 micro g L(-1) were obtained when the peak area was measured. The method was applied in pharmaceutical formulations that were compared with those obtained by through multivariate regression spectrophotometry and micellar capillary electrophoresis (MEKC). HPLC results are in accordance with the results obtained by MEKC. The spectrophotometric method was suitable only for synthetic samples.  相似文献   

13.
Thermal behavior of nitroguanidine (NQ) has been investigated by TG/DSC-MS-FTIR simultaneous analysis performed under both isothermal and nonisothermal conditions. The isothermal test at 230 °C indicated that the release of gas products can be divided into several stages. The processing of the non-isothermal data, namely 5, 10, 15, and 20 K/min, was performed by using Netzsch Thermokinetics. The dependence of the activation energy evaluated by Friedman’s isoconversional method on the conversion degree shows that the investigated process is complex one, and can be divided into three parts. The mechanism of the process and the corresponding kinetic parameters were determined by Multivariate Non-linear Regression Program. The kinetic results was used to simulate the thermal decomposition of NQ under isothermal condition at 210 °C. The simulated curve is in agreement with the tested curve. The obtained results were also used for prediction of the thermal lifetime of NQ corresponding to a certain temperature.  相似文献   

14.
The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.  相似文献   

15.
This paper presents the quantification of Penicillin V and phenoxyacetic acid, a precursor, inline during Pencillium chrysogenum fermentations by FTIR spectroscopy and partial least squares (PLS) regression and multivariate curve resolution – alternating least squares (MCR-ALS). First, the applicability of an attenuated total reflection FTIR fiber optic probe was assessed offline by measuring standards of the analytes of interest and investigating matrix effects of the fermentation broth. Then measurements were performed inline during four fed-batch fermentations with online HPLC for the determination of Penicillin V and phenoxyacetic acid as reference analysis. PLS and MCR-ALS models were built using these data and validated by comparison of single analyte spectra with the selectivity ratio of the PLS models and the extracted spectral traces of the MCR-ALS models, respectively. The achieved root mean square errors of cross-validation for the PLS regressions were 0.22 g L−1 for Penicillin V and 0.32 g L−1 for phenoxyacetic acid and the root mean square errors of prediction for MCR-ALS were 0.23 g L−1 for Penicillin V and 0.15 g L−1 for phenoxyacetic acid. A general work-flow for building and assessing chemometric regression models for the quantification of multiple analytes in bioprocesses by FTIR spectroscopy is given.  相似文献   

16.
17.
Neural networks in multivariate calibration.   总被引:1,自引:0,他引:1  
F Despagne  D L Massart 《The Analyst》1998,123(11):157R-178R
  相似文献   

18.
Two spectrophotometric methods for the determination of Ethinylestradiol (ETE) and Levonorgestrel (LEV) by using the multivariate calibration technique of partial least square (PLS) and principal component regression (PCR) are presented. In this study the PLS and PCR are successfully applied to quantify both hormones using the information contained in the absorption spectra of appropriate solutions. In order to do this, a calibration set of standard samples composed of different mixtures of both compounds has been designed. The results found by application of the PLS and PCR methods to the simultaneous determination of mixtures, containing 4–11 μg ml−1 of ETE and 2–23 μg ml−1 of LEV, are reported. Five different oral contraceptives were analyzed and the results were very similar to that obtained by a reference liquid Chromatographic method.  相似文献   

19.
20.
In multivariate calibration with the spectral dataset, variable selection is often applied to identify relevant subset of variables, leading to improved prediction accuracy and easy interpretation of the selected fingerprint regions. Until now, numerous variable selection methods have been proposed, but a proper choice among them is not trivial. Furthermore, in many cases, a set of variables found by those methods might not be robust due to the irreproducibility and uncertainty issues, posing a great challenge in improving the reliability of the variable selection. In this study, the reproducibility of the 5 variable selection methods was investigated quantitatively for evaluating their performance. The reproducibility of variable selection was quantified by using Monte-Carlo sub-sampling (MCS) techniques together with the quantitative similarity measure designed for the highly collinear spectral dataset. The investigation of reproducibility and prediction accuracy of the several variable selection algorithms with two different near-infrared (NIR) datasets illustrated that the different variable selection methods exhibited wide variability in their performance, especially in their capabilities to identify the consistent subset of variables from the spectral datasets. Thus the thorough assessment of the reproducibility together with the predictive accuracy of the identified variables improved the statistical validity and confidence of the selection outcome, which cannot be addressed by the conventional evaluation schemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号