首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
2.
The prediction of the partition behaviour of proteins in aqueous two-phase systems (ATPS) using mathematical models based on their amino acid composition was investigated. The predictive models are based on the average surface hydrophobicity (ASH). The ASH was estimated by means of models that use the three-dimensional structure of proteins and by models that use only the amino acid composition of proteins. These models were evaluated for a set of 11 proteins with known experimental partition coefficient in four-phase systems: polyethylene glycol (PEG) 4000/phosphate, sulfate, citrate and dextran and considering three levels of NaCl concentration (0.0% w/w, 0.6% w/w and 8.8% w/w). The results indicate that such prediction is feasible even though the quality of the prediction depends strongly on the ATPS and its operational conditions such as the NaCl concentration. The ATPS 0 model which use the three-dimensional structure obtains similar results to those given by previous models based on variables measured in the laboratory. In addition it maintains the main characteristics of the hydrophobic resolution and intrinsic hydrophobicity reported before. Three mathematical models, ATPS I-III, based only on the amino acid composition were evaluated. The best results were obtained by the ATPS I model which assumes that all of the amino acids are completely exposed. The performance of the ATPS I model follows the behaviour reported previously, i.e. its correlation coefficients improve as the NaCl concentration increases in the system and, therefore, the effect of the protein hydrophobicity prevails over other effects such as charge or size. Its best predictive performance was obtained for the PEG/dextran system at high NaCl concentration. An increase in the predictive capacity of at least 54.4% with respect to the models which use the three-dimensional structure of the protein was obtained for that system. In addition, the ATPS I model exhibits high correlation coefficients in that system being higher than 0.88 on average. The ATPS I model exhibited correlation coefficients higher than 0.67 for the rest of the ATPS at high NaCl concentration. Finally, we tested our best model, the ATPS I model, on the prediction of the partition coefficient of the protein invertase. We found that the predictive capacities of the ATPS I model are better in PEG/dextran systems, where the relative error of the prediction with respect to the experimental value is 15.6%.  相似文献   

3.
This paper focuses on the prediction of the dimensionless retention time of proteins (DRT) in hydrophobic interaction chromatography (HIC) by means of mathematical models based on characteristics of the surface hydrophobicity distribution. We introduce a new parameter, called hydrophobic imbalance (HI), obtained from the three-dimensional structure of proteins. This parameter quantifies the displacement of the superficial geometric centre of the protein when the effect of the hydrophobicity of each amino acid is considered. This parameter is simpler and less expensive than those reported previously. We use HI as a way to incorporate information about the surface hydrophobicity distribution in order to improve the prediction of DRT. We tested the performance of our DRT predictive models in a set of 15 proteins. This set includes four proteins whose DRTs are known as very difficult to predict. By means of the variable HI, it was possible to improve the predictive characteristics obtained by models based on the average surface hydrophobicity (ASH) by 9.1%. Also, we studied linear multivariable models based on characteristics determined from the HI. By using this multivariable model, a correlation coefficient of 0.899 was obtained. With this model, we managed to improve the predictive characteristics shown by previous models based on ASH by 31.8%.  相似文献   

4.
Solvent accessibility prediction from amino acid sequences has been pursued by several researchers. Such a prediction typically starts by transforming the amino acid category (or type) information into numerical representations. All twenty amino acids can be completely and uniquely represented by 20-dimensional vectors. Here, we investigate if the amino acid space defined in this way really requires twenty dimensions. We tried to develop corresponding representations in fewer dimensions. A method for searching optimal codification schema in an arbitrary space using neural networks was developed. The method is used to obtain optimal encoding of amino acids at various levels of dimensionality, and applied to optimize the amino acid codifications for the prediction of the solvent accessibility values of the proteins using feed-forward neural networks. The traditional 20-dimensional codification seems to be redundant in solving the solvent accessibility prediction problem, since a 1-dimensional codification is able to achieve almost the same degree of accuracy as the 20-dimensional codification. Optimal coding in much fewer dimensions could be used to make the predictions of accessible surface area with almost the same degree of accuracy as that obtained by a fully unique 20-dimensional coding. The 1-dimensional amino acid codification for solvent accessibility prediction obtained by a purely mathematical way based on neural networks is highly correlated with a physical property of the amino acids, namely their average solvent accessibility. The method developed to find the optimal codification is general, although the codification thus produced is dependent on the type of estimated property.  相似文献   

5.
For predicting solvent accessibility from the sequence of amino acids in proteins, we use a logistic function trained on a non-redundant protein database. Using a principal component analysis, we find that the prediction can be considered, in a good approximation, as a monofactorial problem: a crossed effect of the burial propensity of amino acids and of their locations at positions flanking the amino acid of interest. Complementary effects depend on the presence of certain amino acids (mostly P, G and C) at given positions. We have refined the predictive model (1) by adding supplementary input data, (2) by using a strategy of prediction correction and (3) by adapting the decision rules according to the amino acid type. We obtain a best score of 77.6% correct prediction for a relative accessibility of 9%. However, compared to trivial strategy only based upon the frequencies of buried or exposed residues, the gain is less than 4%. Received: 4 June 1998 / Accepted: 17 September 1998 / Published online: 10 December 1998  相似文献   

6.
Hydrophobicity is one of the most important physicochemical properties of proteins. Moreover, it plays a fundamental role in hydrophobic interaction chromatography, a separation technique that, at present time, is used in most industrial processes for protein purification as well as in laboratory scale applications. Although there are many ways of assessing the hydrophobicity value of a protein, recently, it has been shown that the average surface hydrophobicity (ASH) is an important tool in the area of protein separation and purification particularly in protein chromatography. The ASH is calculated based on the hydrophobic characteristics of each class of amino acid present on the protein surface. The hydrophobic characteristics of the amino acids are determined by a scale of aminoacidic hydrophobicity. In this work, the scales of Cowan-Whittaker and Berggren were studied. However, to calculate the ASH, it is necessary to have the three-dimensional protein structure. Frequently this data does not exist, and the only information available is the amino acid sequence. In these cases it would be desirable to estimate the ASH based only on properties extracted from the protein sequence. It was found that it is possible to predict the ASH from a protein to an acceptable level for many practical applications (correlation coefficient > 0.8) using only the aminoacidic composition. Two predictive tools were built: one based on a simple linear model and the other on a neural network. Both tools were constructed starting from the analysis of a set of 1982 non-redundant proteins. The linear model was able to predict the ASH for an independent subset with a correlation coefficient of 0.769 for the case of Cowan-Whittaker and 0.803 for the case of Berggren. On the other hand, the neural model improved the results shown by the linear model obtaining correlation coefficients of 0.831 and 0.836, respectively. The neural model was somewhat more robust than the linear model particularly as it gave similar correlation coefficients for both hydrophobicity scales tested, moreover, the observed variabilities did not overcome 6.1% of the mean square error. Finally, we tested our models in a set of nine proteins with known retention time in hydrophobic interaction chromatography. We found that both models can predict this retention time with correlation coefficients only slightly inferior (11.5% and 5.5% for the linear and the neural network models, respectively) than models that use the information about the three-dimensional structure of proteins.  相似文献   

7.
8.
9.
Unlike all-helices membrane proteins, beta-barrel membrane proteins can not be successfully discriminated from other proteins, especially from all-beta soluble proteins. This paper performs an analysis on the amino acid composition in membrane parts of 12 beta-barrel membrane proteins versus beta-strands of 79 all-beta soluble proteins. The average and variance of the amino acid composition in these two classes are calculated. Amino acids such as Gly, Asn, Val that are most likely associated with classification are selected based on Fishers discriminant ratio. A linear classifier built with these selected amino acids composition in observed beta-strands achieves 100% classification accuracy for 12 membrane proteins and 79 soluble proteins in a four-fold cross-validation experiment. Since at present the accuracy of secondary structure prediction is quite high, a promising method to identify beta-barrel membrane proteins is presented based on the linear classifier coupled with predicted secondary structure. Applied to 241 beta-barrel membrane proteins and 3855 soluble proteins with various structures, the method achieves 85.48% (206/241) sensitivity and 92.53% specificity (3567/3855).  相似文献   

10.
Proteins are classified mainly on the basis of alignments of amino acid sequences. Drug discovery processes based on pharmacologically important proteins such as G-protein-coupled receptors (GPCRs) may be facilitated if more information is extracted directly from the primary sequences. Here, we investigate an alignment-free approach to protein classification using self-organizing maps (SOMs), a kind of artificial neural network, which needs only primary sequences of proteins and determines their relative locations in a two-dimensional lattice of neurons through an adaptive process. We first showed that a set of 1397 aligned samples of Class A GPCRs can be classified by our SOM program into 15 conventional categories with 99.2% accuracy. Similarly, a nonaligned raw sequence data set of 4116 samples was categorized into 15 conventional families with 97.8% accuracy in a cross-validation test. Orphan GPCRs were also classified appropriately using the result of the SOM learning. A supposedly diverse family of olfactory receptors formed the most distinctive cluster in the map, whereas amine and peptide families exhibited diffuse distributions. A feature of this kind in the map can be interpreted to reflect hierarchical family composition. Interestingly, some orphan receptors that were categorized as olfactory were somatosensory chemoreceptors. These results suggest the applicability and potential of the SOM program to classification prediction and knowledge discovery from protein sequences.  相似文献   

11.
At present, there are a number of methods for the prediction of T-cell epitopes and major histocompatibility complex (MHC)-binding peptides. Despite numerous methods for predicting T-cell epitopes, there still exist limitations that affect the reliability of prevailing methods. For this reason, the development of models with high accuracy are crucial. An accurate prediction of the peptides that bind to specific major histocompatibility complex class I and II (MHC-I and MHC-II) molecules is important for an understanding of the functioning of the immune system and the development of peptide-based vaccines. Peptide binding is the most selective step in identifying T-cell epitopes. In this paper, we present a new approach to predicting MHC-binding ligands that takes into account new weighting schemes for position-based amino acid frequencies, BLOSUM and VOGG substitution of amino acids, and the physicochemical and molecular properties of amino acids. We have made models for quantitatively and qualitatively predicting MHC-binding ligands. Our models are based on two machine learning methods support vector machine (SVM) and support vector regression (SVR), where our models have used for feature selection, several different encoding and weighting schemes for peptides. The resulting models showed comparable, and in some cases better, performance than the best existing predictors. The obtained results indicate that the physicochemical and molecular properties of amino acids (AA) contribute significantly to the peptide-binding affinity.  相似文献   

12.
A novel representation of proteins was introduced. It is independent of arbitrary decisions with respect to the choice of labels to be assigned to the 20 natural amino acids. The approach is based on an assignment of 20 unit vectors in 20-dimensional vector space to the 20 natural amino acids. Proteins are then represented by a walk, that is, a sequence of steps in the 20-dimensional space analogous to a walk in the (x, y) plane in the case of binary strings. A straightforward numerical characterization of proteins is obtained from the distance matrix associated with the walk representing the protein in 20-dimensional space combining the information on the Euclidean distance between various amino acids in protein sequence. The Line Distance matrix offers additional numerical characterization of proteins, while the lengths of steps of the walk in 20-D space allow construction of a "protein profile," which represents distribution of average lengths of the steps and their powers.  相似文献   

13.
On-going efforts to improve protein structure prediction stimulate the development of scoring functions and methods for model quality assessment (MQA) that can be used to rank and select the best protein models for further refinement. In this work, sequence-based prediction of relative solvent accessibility (RSA) is employed as a basis for a simple MQA method for soluble proteins, and subsequently extended to the much less explored case of (alpha-helical) membrane proteins. In analogy to soluble proteins, the level of exposure to the lipid of amino acid residues in transmembrane (TM) domains is captured in terms of the relative lipid accessibility (RLA), which is predicted from sequence using low-complexity Support Vector Regression models. On an independent set of 23 TM proteins, the new SVR-based predictor yields correlation coefficient (CC) of 0.56 between the predicted and observed RLA profiles, as opposed to CC of 0.13 for a baseline predictor that utilizes TMLIP2H empirical lipophilicity scale (with standard deviations of about 0.15). A simple MQA approach is then defined by ranking models of membrane proteins in terms of consistency between predicted and observed RLA profiles, as a measure of similarity to the native structure. The new method does not require a set of decoy models to optimize parameters, circumventing current limitations in this regard. Several different sets of models, including those generated by fragment based folding simulations, and decoys obtained by swapping TM helices to mimic errors in template based assignment, are used to assess the new approach. Predicted RLA profiles can be used to successfully discriminate near native models from non-native decoys in most cases, significantly improving the separation of correct and incorrectly folded models compared to a simple baseline approach that utilizes TMLIP2H. As suggested by the robust performance of a simple MQA method for soluble proteins that utilizes more accurate RSA predictions, further significant improvements are likely to be achieved. The steady growth in the number of resolved membrane protein structures is expected to yield enhanced RLA predictions, facilitating further efforts to improve de novo and template based prediction of membrane protein structure.  相似文献   

14.
泛素-蛋白酶体在真核生物的抗原呈递、细胞周期调控和转录因子激活等生理过程中发挥着极为重要的作用,其核心就是蛋白酶体对底物的选择性酶切作用,因此对选择性酶切位点的预测一直是计算生物学的一个重点研究内容.针对现有酶切位点预测方法的非线性和物理意义不明确等问题,借鉴定量构效关系研究方法,采用基于氨基酸物理化学性质的描述子——VHSE(Principal component score vector of hydrophobic,steric,and electronic properties)对收集的2650个MHC-I配体的源蛋白序列进行了结构表征,在此基础上利用支持向量机建立了蛋白酶体酶切位点的预测模型,其最优线性模型的灵敏度(Sensitivity)、特异性(Specificity)、接受者操作特征曲线下面积(area under receiver operatingcharacteristics curve,AUC)和马休斯相关系数(Matthews coefficient of correlation,MCC)分别为90.18%,69.63%,0.8797和0.6131.模型分析结果表明:影响酶切位点选择性的氨基酸性质由大到小依次为:疏水性、电性和立体特征;P9,P8,P4,P1,P3’,P4’和P5’位氨基酸对酶切位点的选择有重要影响,研究亦显示酶切位点上游P1位和下游P1’~P5’的"疏水势差"有利于蛋白酶体的切割作用.  相似文献   

15.
Implicit solvent hydration free energy models are an important component of most modern computational methods aimed at protein structure prediction, binding affinity prediction, and modeling of conformational equilibria. The nonpolar component of the hydration free energy, consisting of a repulsive cavity term and an attractive van der Waals solute-solvent interaction term, is often modeled using estimators based on the solvent exposed solute surface area. In this paper, we analyze the accuracy of linear surface area models for predicting the van der Waals solute-solvent interaction energies of native and non-native protein conformations, peptides and small molecules, and the desolvation penalty of protein-protein and protein-ligand binding complexes. The target values are obtained from explicit solvent simulations and from a continuum solvent van der Waals interaction energy model. The results indicate that the standard surface area model, while useful on a coarse-grained scale, may not be accurate or transferable enough for high resolution modeling studies of protein folding and binding. The continuum model constructed in the course of this study provides one path for the development of a computationally efficient implicit solvent nonpolar hydration free energy estimator suitable for high-resolution structural and thermodynamic modeling of biological macromolecules.  相似文献   

16.
Unlike all-helices membrane proteins, β-barrel membrane proteins can not be successfully discriminated from other proteins, especially from all-β soluble proteins. This paper performs an analysis on the amino acid composition in membrane parts of 12 β-barrel membrane proteins versus β-strands of 79 all-β soluble proteins. The average and variance of the amino acid composition in these two classes are calculated. Amino acids such as Gly, Asn, Val that are most likely associated with classification are selected based on Fishers discriminant ratio. A linear classifier built with these selected amino acids composition in observed β-strands achieves 100% classification accuracy for 12 membrane proteins and 79 soluble proteins in a four-fold cross-validation experiment. Since at present the accuracy of secondary structure prediction is quite high, a promising method to identify β-barrel membrane proteins is presented based on the linear classifier coupled with predicted secondary structure. Applied to 241 β-barrel membrane proteins and 3855 soluble proteins with various structures, the method achieves 85.48% (206/241) sensitivity and 92.53% specificity (3567/3855).  相似文献   

17.
18.
In this work, we developed artificial intelligence-based models for prediction and correlation of CO2 solubility in amino acid solutions for the purpose of CO2 capture. The models were used to correlate the process parameters to the CO2 loading in the solvent. Indeed, CO2 loading/solubility in the solvent was considered as the sole model’s output. The studied solvent in this work were potassium and sodium-based amino acid salt solutions. For the predictions, we tried three potential models, including Multi-layer Perceptron (MLP), Decision Tree (DT), and AdaBoost-DT. In order to discover the ideal hyperparameters for each model, we ran the method multiple times to find out the best model. R2 scores for all three models exceeded 0.9 after optimization confirming the great prediction capabilities for all models. AdaBoost-DT indicated the highest R2 Score of 0.998. With an R2 of 0.98, Decision Tree was the second most accurate one, followed by MLP with an R2 of 0.9.  相似文献   

19.
《Fluid Phase Equilibria》2006,240(1):40-45
In this paper, the activity coefficients of amino acids and simple peptides in aqueous solutions were correlated by using a three parameters model based on the perturbation theory. The adjustable parameters of this model were obtained from the experimental data and a relation for calculating the activity coefficient is derived. The calculated activity coefficients of amino acids and simple peptides obtained show that the equation of state based on the perturbation model can be used to correlate the activity coefficients of amino acids more accurately than the other models. A correlation for the solubility of amino acids in aqueous solutions is also derived. The results show that this correlation can accurately correlate the solubility of amino acids in aqueous solution over a wide range of temperatures (0–100 °C).  相似文献   

20.
基于氨基酸模糊聚类分析的跨膜区域预测   总被引:2,自引:0,他引:2  
邓勇  刘琪  李亦学 《化学学报》2004,62(19):1968-1972
跨膜蛋白在进化过程中,序列保守性较差,即使是同源蛋白序列的一致性程度也较低,因而在跨膜区预测算法中,通过序列的一致性程度来选取训练集并不能有效地消除预测结果对训练集的过度适应性.本文提出了一种基于氨基酸模糊聚类分析的预测算法,通过氨基酸在各个区域分布的相似性程度进行模糊聚类,从而根据一类氨基酸的分布特性而不是各个氨基酸的分布特性进行跨膜区预测.结果表明,该方法能在一定程度上消除训练集的选取对测试结果的影响,提高跨膜蛋白拓扑结构预测的准确度,特别是提高对目前知之甚少的跨膜蛋白的预测准确度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号