共查询到20条相似文献,搜索用时 15 毫秒
1.
Kernel partial least squares (KPLS) and support vector regression (SVR) have become popular techniques for regression of complex non-linear data sets. The modeling is performed by mapping the data in a higher dimensional feature space through the kernel transformation. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the regression is lost. In this paper we introduce a method which can retrieve and visualize the contribution of the variables to the regression model and the way the variables contribute to the regression of complex data sets. The method is based on the visualization of trajectories using so-called pseudo samples representing the original variables in the data. We test and illustrate the proposed method to several synthetic and real benchmark data sets. The results show that for linear and non-linear regression models the important variables were identified with corresponding linear or non-linear trajectories. The results were verified by comparing with ordinary PLS regression and by selecting those variables which were indicated as important and rebuilding a model with only those variables. 相似文献
2.
3.
Kumar P Ma X Liu X Jia J Bucong H Xue Y Li ZR Yang SY Wei YQ Chen YZ 《Journal of computer-aided molecular design》2011,25(5):455-467
Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and
non-genotoxicity (GT−) identification rates. New methods and combinatorial approaches have been explored for enhanced collective
identification capability. The rates of in-silco methods may be further improved by significantly diversified training data
enriched by the large number of recently reported GT+ and GT− compounds, but a major concern is the increased noise levels
arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise
level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two
SVMs of different diversity/noise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+
in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test
only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT− compounds excluding clinical trial drugs correctly
identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT−, and
23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1–51.9% GT+ and 75–93%
GT− rates of existing in-silico methods, 58.8% GT+ and 79% GT− rates of Ames method, and the estimated percentages of 23%
in vivo and 31–33% in vitro GT+ compounds in the “universe of chemicals”. There is a substantial level of agreement between
H-SVM and L-SVM predicted GT+ and GT− MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying
GT+ compounds from large compound libraries based on higher diversity and higher noise training data. 相似文献
4.
5.
6.
7.
8.
9.
10.
Rella M Rushworth CA Guy JL Turner AJ Langer T Jackson RM 《Journal of chemical information and modeling》2006,46(2):708-716
The metallopeptidase Angiotensin Converting Enzyme (ACE) is an important drug target for the treatment of hypertension, heart, kidney, and lung disease. Recently, a close and unique human ACE homologue termed ACE2 has been identified and found to be an interesting new cardiorenal disease target. With the recently resolved inhibitor-bound ACE2 crystal structure available, we have attempted a structure-based approach to identify novel potent and selective inhibitors. Computational approaches focus on pharmacophore-based virtual screening of large compound databases. Selectivity was ensured by initial screening for ACE inhibitors within an internal database and the Derwent World Drug Index, which could be reduced to zero false positives and 0.1% hit rate, respectively. An average hit reduction of 0.44% was achieved with a five feature hypothesis, searching approximately 3.8 million compounds from various commercial databases. Seventeen compounds were selected based on high fit values as well as diverse structure and subjected to experimental validation in a bioassay. We show that all compounds displayed an inhibitory effect on ACE2 activity, the six most promising candidates exhibiting IC50 values in the range of 62-179 microM. 相似文献
11.
For a long time, the structural basis of TXA2 receptor is limited due to the lack of crystal structure information, till the release of the crystal structure of TXA2 receptor, which deepens our understanding about ligand recognition and selectivity mechanisms of this physiologically important receptor. In this research, we report the successful implementation in the discovery of an optimal pharmacophore model of human TXA2 receptor antagonists through virtual screening. Structure-based pharmacophore models were generated based on two crystal structures of human TXA2 receptor (PDB entry 6IIU and 6IIV). Docking simulation revealed interaction modes of the virtual screening hits against TXA2 receptor, which was validated through molecular dynamics simulation and binding free energy calculation. ADMET properties were also analyzed to evaluate the toxicity and physio-chemical characteristics of the hits. The research would provide valuable insight into the binding mechanisms of TXA2 receptor antagonists and thus be helpful for designing novel antagonists. 相似文献
12.
Diagnosing breast cancer based on support vector machines 总被引:8,自引:0,他引:8
Liu HX Zhang RS Luan F Yao XJ Liu MC Hu ZD Fan BT 《Journal of chemical information and computer sciences》2003,43(3):900-907
The Support Vector Machine (SVM) classification algorithm, recently developed from the machine learning community, was used to diagnose breast cancer. At the same time, the SVM was compared to several machine learning techniques currently used in this field. The classification task involves predicting the state of diseases, using data obtained from the UCI machine learning repository. SVM outperformed k-means cluster and two artificial neural networks on the whole. It can be concluded that nine samples could be mislabeled from the comparison of several machine learning techniques. 相似文献
13.
14.
Warmuth MK Liao J Rätsch G Mathieson M Putta S Lemmen C 《Journal of chemical information and computer sciences》2003,43(2):667-673
We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from Machine Learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector Machines". This hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones. 相似文献
15.
16.
New leads for selective GSK-3 inhibition: pharmacophore mapping
and virtual screening studies 总被引:2,自引:0,他引:2
Summary Glycogen Synthase Kinase-3 is a regulatory serine/threonine kinase, which is being targeted for the treatment of a number of human diseases including type-2 diabetes mellitus, neurodegenerative diseases, cancer and chronic inflammation. Selective GSK-3 inhibition is an important requirement owing to the possibility of side effects arising from other kinases. A pharmacophore mapping strategy is employed in this work to identify new leads for selective GSK-3 inhibition. Ligands known to show selective GSK-3 inhibition were employed in generating a pharmacophore map using distance comparison method (DISCO). The derived pharmacophore map was validated using (i) important interactions involved in selective GSK-3 inhibitions, and (ii) an in-house database containing different classes of GSK-3 selective, non-selective and inactive molecules. New Lead identification was carried out by performing virtual screening using validated pharmacophoric query and three chemical databases namely NCI, Maybridge and Leadquest. Further data reduction was carried out by employing virtual filters based on (i) Lipinski’s rule of 5 (ii) van der Waals bumps and (iii) restricting the number of rotatable bonds to seven. Final screening was carried out using FlexX based molecular docking study. 相似文献
17.
18.
A new method has been developed for prediction of transmembrane helices using support vector machines. Different coding schemes of protein sequences were explored, and their performances were assessed by crossvalidation tests. The best performance method can predict the transmembrane helices with sensitivity of 93.4% and precision of 92.0%. For each predicted transmembrane segment, a score is given to show the strength of transmembrane signal and the prediction reliability. In particular, this method can distinguish transmembrane proteins from soluble proteins with an accuracy of approximately 99%. This method can be used to complement current transmembrane helix prediction methods and can be used for consensus analysis of entire proteomes. The predictor is located at http://genet.imb.uq.edu.au/predictors/SVMtm. 相似文献
19.
20.
Pieter Van Renterghem Pierre-Edouard Sottas Martial Saugy Peter Van Eenoo 《Analytica chimica acta》2013
Due to their performance enhancing properties, use of anabolic steroids (e.g. testosterone, nandrolone, etc.) is banned in elite sports. Therefore, doping control laboratories accredited by the World Anti-Doping Agency (WADA) screen among others for these prohibited substances in urine. It is particularly challenging to detect misuse with naturally occurring anabolic steroids such as testosterone (T), which is a popular ergogenic agent in sports and society. 相似文献