首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kernel partial least squares (KPLS) and support vector regression (SVR) have become popular techniques for regression of complex non-linear data sets. The modeling is performed by mapping the data in a higher dimensional feature space through the kernel transformation. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the regression is lost. In this paper we introduce a method which can retrieve and visualize the contribution of the variables to the regression model and the way the variables contribute to the regression of complex data sets. The method is based on the visualization of trajectories using so-called pseudo samples representing the original variables in the data. We test and illustrate the proposed method to several synthetic and real benchmark data sets. The results show that for linear and non-linear regression models the important variables were identified with corresponding linear or non-linear trajectories. The results were verified by comparing with ordinary PLS regression and by selecting those variables which were indicated as important and rebuilding a model with only those variables.  相似文献   

2.
3.
Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and non-genotoxicity (GT−) identification rates. New methods and combinatorial approaches have been explored for enhanced collective identification capability. The rates of in-silco methods may be further improved by significantly diversified training data enriched by the large number of recently reported GT+ and GT− compounds, but a major concern is the increased noise levels arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two SVMs of different diversity/noise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+ in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT− compounds excluding clinical trial drugs correctly identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT−, and 23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1–51.9% GT+ and 75–93% GT− rates of existing in-silico methods, 58.8% GT+ and 79% GT− rates of Ames method, and the estimated percentages of 23% in vivo and 31–33% in vitro GT+ compounds in the “universe of chemicals”. There is a substantial level of agreement between H-SVM and L-SVM predicted GT+ and GT− MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying GT+ compounds from large compound libraries based on higher diversity and higher noise training data.  相似文献   

4.
5.
6.
7.
8.
9.
10.
The metallopeptidase Angiotensin Converting Enzyme (ACE) is an important drug target for the treatment of hypertension, heart, kidney, and lung disease. Recently, a close and unique human ACE homologue termed ACE2 has been identified and found to be an interesting new cardiorenal disease target. With the recently resolved inhibitor-bound ACE2 crystal structure available, we have attempted a structure-based approach to identify novel potent and selective inhibitors. Computational approaches focus on pharmacophore-based virtual screening of large compound databases. Selectivity was ensured by initial screening for ACE inhibitors within an internal database and the Derwent World Drug Index, which could be reduced to zero false positives and 0.1% hit rate, respectively. An average hit reduction of 0.44% was achieved with a five feature hypothesis, searching approximately 3.8 million compounds from various commercial databases. Seventeen compounds were selected based on high fit values as well as diverse structure and subjected to experimental validation in a bioassay. We show that all compounds displayed an inhibitory effect on ACE2 activity, the six most promising candidates exhibiting IC50 values in the range of 62-179 microM.  相似文献   

11.
For a long time, the structural basis of TXA2 receptor is limited due to the lack of crystal structure information, till the release of the crystal structure of TXA2 receptor, which deepens our understanding about ligand recognition and selectivity mechanisms of this physiologically important receptor. In this research, we report the successful implementation in the discovery of an optimal pharmacophore model of human TXA2 receptor antagonists through virtual screening. Structure-based pharmacophore models were generated based on two crystal structures of human TXA2 receptor (PDB entry 6IIU and 6IIV). Docking simulation revealed interaction modes of the virtual screening hits against TXA2 receptor, which was validated through molecular dynamics simulation and binding free energy calculation. ADMET properties were also analyzed to evaluate the toxicity and physio-chemical characteristics of the hits. The research would provide valuable insight into the binding mechanisms of TXA2 receptor antagonists and thus be helpful for designing novel antagonists.  相似文献   

12.
Diagnosing breast cancer based on support vector machines   总被引:8,自引:0,他引:8  
The Support Vector Machine (SVM) classification algorithm, recently developed from the machine learning community, was used to diagnose breast cancer. At the same time, the SVM was compared to several machine learning techniques currently used in this field. The classification task involves predicting the state of diseases, using data obtained from the UCI machine learning repository. SVM outperformed k-means cluster and two artificial neural networks on the whole. It can be concluded that nine samples could be mislabeled from the comparison of several machine learning techniques.  相似文献   

13.
14.
Active learning with support vector machines in the drug discovery process   总被引:6,自引:0,他引:6  
We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from Machine Learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector Machines". This hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.  相似文献   

15.
16.
Summary Glycogen Synthase Kinase-3 is a regulatory serine/threonine kinase, which is being targeted for the treatment of a number of human diseases including type-2 diabetes mellitus, neurodegenerative diseases, cancer and chronic inflammation. Selective GSK-3 inhibition is an important requirement owing to the possibility of side effects arising from other kinases. A pharmacophore mapping strategy is employed in this work to identify new leads for selective GSK-3 inhibition. Ligands known to show selective GSK-3 inhibition were employed in generating a pharmacophore map using distance comparison method (DISCO). The derived pharmacophore map was validated using (i) important interactions involved in selective GSK-3 inhibitions, and (ii) an in-house database containing different classes of GSK-3 selective, non-selective and inactive molecules. New Lead identification was carried out by performing virtual screening using validated pharmacophoric query and three chemical databases namely NCI, Maybridge and Leadquest. Further data reduction was carried out by employing virtual filters based on (i) Lipinski’s rule of 5 (ii) van der Waals bumps and (iii) restricting the number of rotatable bonds to seven. Final screening was carried out using FlexX based molecular docking study.  相似文献   

17.
18.
A new method has been developed for prediction of transmembrane helices using support vector machines. Different coding schemes of protein sequences were explored, and their performances were assessed by crossvalidation tests. The best performance method can predict the transmembrane helices with sensitivity of 93.4% and precision of 92.0%. For each predicted transmembrane segment, a score is given to show the strength of transmembrane signal and the prediction reliability. In particular, this method can distinguish transmembrane proteins from soluble proteins with an accuracy of approximately 99%. This method can be used to complement current transmembrane helix prediction methods and can be used for consensus analysis of entire proteomes. The predictor is located at http://genet.imb.uq.edu.au/predictors/SVMtm.  相似文献   

19.
20.
Due to their performance enhancing properties, use of anabolic steroids (e.g. testosterone, nandrolone, etc.) is banned in elite sports. Therefore, doping control laboratories accredited by the World Anti-Doping Agency (WADA) screen among others for these prohibited substances in urine. It is particularly challenging to detect misuse with naturally occurring anabolic steroids such as testosterone (T), which is a popular ergogenic agent in sports and society.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号