首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe a method for modeling chemical mutagenicity in terms of simple rules based on molecular features. A classification model was built using a rule-based ensemble method called RuleFit, developed by Friedman and Popescu. We show how performance compares favorably against literature methods. Performance was measured through the use of cross-validation and testing on external test sets. All data sets used are publicly available. The method automatically generated transparent rules in terms of molecular structure that agree well with known toxicology. While we have focused on chemical mutagenicity in demonstrating this method, we anticipate that it may be more generally useful in modeling other molecular properties such as other types of chemical toxicity.  相似文献   

2.
3.
4.
5.
Decision support for selecting suitable QSARs for predictive purposes is suggested by a stepwise procedure: The first tier pre-filters the compounds based on substructure indicators for baseline versus excess toxicity. This step, if sufficiently conservative, discriminates chemicals, whose toxicity can be reliably estimated from their log KOW from those, that require further classification by biological and chemical domain. A test set of 115 chemicals from 9 different MOA classes was used to compare the discriminatory power of several classification schemes based on substructure indicators. Performance, evaluated by contingency table statistics, is varied and no single scheme provides sufficient applicability and reliability for pre-filtering chemical inventories. Major improvements are feasible with combined use of three classification schemes: assignments of baseline toxicants are protective, recognition of excess toxicants is acceptable and applicability range increases favourably.  相似文献   

6.
7.
8.
Pharmacophore modeling of large, drug-like molecules, such as the dopamine reuptake inhibitor GBR 12909, is complicated by their flexibility. A comprehensive hierarchical clustering study of two GBR 12909 analogs was performed to identify representative conformers for input to three-dimensional quantitative structure–activity relationship studies of closely-related analogs. Two data sets of more than 700 conformers each produced by random search conformational analysis of a piperazine and a piperidine GBR 12909 analog were studied. Several clustering studies were carried out based on different feature sets that include the important pharmacophore elements. The distance maps, the plot of the effective number of clusters versus actual number of clusters, and the novel derived clustering statistic, percentage change in the effective number of clusters, were shown to be useful in determining the appropriate clustering level.Six clusters were chosen for each analog, each representing a different region of the torsional angle space that determines the relative orientation of the pharmacophore elements. Conformers of each cluster that are representative of these regions were identified and compared for each analog. This study illustrates the utility of using hierarchical clustering for the classification of conformers of highly flexible molecules in terms of the three-dimensional spatial orientation of key pharmacophore elements.  相似文献   

9.
基于支持向量学习机方法的人体小肠吸收药物活性的预测   总被引:2,自引:0,他引:2  
为了预测分子在人体小肠中的吸收,本文计算了表征分子的电子、拓扑、几何结构、分子形状等特征的102个分子描述符,用遗传算法变量选择方法使描述符减少到47个。体系共包含了230个化合物分子,69个不能被吸收(mA-),161个可以被吸收(HIA )。对建立的SVM模型,用5重交叉验证和独立测试集进行验证,预测正确率分别达到79.1%和77.1%,结果具有较好的一致性。在模型验证中,通过聚类分析方法组合训练集和测试集,保证了模型的稳定性,提高了建模效率。  相似文献   

10.
A public bacterial mutagenicity database was classified into 2-D structural families using a set of specific algorithms and clustering techniques that find overlapping classes of compounds based upon chemical substructures. Structure-activity relationships were learned from the biological activity of the compounds within each class and used to identify rules that define substructures potentially responsible for mutagenic activity. In addition, this method of analysis was used to compare the pharmacologically relevant substructure of test compounds with their potential toxic substructures making this a potentially valuable in silico profiling tool for lead selection and optimization.  相似文献   

11.
Decision support for selecting suitable QSARs for predictive purposes is suggested by a stepwise procedure: The first tier pre-filters the compounds based on substructure indicators for baseline versus excess toxicity. This step, if sufficiently conservative, discriminates chemicals, whose toxicity can be reliably estimated from their log?K OW from those, that require further classification by biological and chemical domain. A test set of 115 chemicals from 9 different MOA classes was used to compare the discriminatory power of several classification schemes based on substructure indicators. Performance, evaluated by contingency table statistics, is varied and no single scheme provides sufficient applicability and reliability for pre-filtering chemical inventories. Major improvements are feasible with combined use of three classification schemes: assignments of baseline toxicants are protective, recognition of excess toxicants is acceptable and applicability range increases favourably.  相似文献   

12.
13.
Multispectral images such as multispectral chemical images or multispectral satellite images provide detailed data with information in both the spatial and spectral domains. Many segmentation methods for multispectral images are based on a per-pixel classification, which uses only spectral information and ignores spatial information. A clustering algorithm based on both spectral and spatial information would produce better results.

In this work, spatial refinement clustering (SpaRef), a new clustering algorithm for multispectral images is presented. Spatial information is integrated with partitional and agglomeration clustering processes. The number of clusters is automatically identified. SpaRef is compared with a set of well-known clustering methods on compact airborne spectrographic imager (CASI) over an area in the Klompenwaard, The Netherlands. The clusters obtained show improved results. Applying SpaRef to multispectral chemical images would be a straight-forward step.  相似文献   


14.
Which compound classes are best suited as probes and tools for chemical biology research and as inspiration for medicinal chemistry programs? Chemical space is enormously large and cannot be exploited conclusively by means of synthesis efforts. Methods are required that allow one to identify and map the biologically relevant subspaces of vast chemical space, and serve as hypothesis‐generating tools for inspiring synthesis programs. Biology‐oriented synthesis builds on structural conservatism in the evolution of proteins and natural products. It employs a hierarchical classification of bioactive compounds according to structural relationships and type of bioactivity, and selects the scaffolds of bioactive molecule classes as starting points for the synthesis of compound collections with focused diversity. Navigation in chemical space is facilitated by Scaffold Hunter, an intuitively accessible and highly interactive software. Small molecules synthesized according to BIOS are enriched in bioactivity. They facilitate the analysis of complex biological phenomena by means of acute perturbation and may serve as novel starting points to inspire drug discovery programs.  相似文献   

15.
The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities.  相似文献   

16.
Four different two-dimensional fingerprint types (MACCS, Unity, BCI, and Daylight) and nine methods of selecting optimal cluster levels from the output of a hierarchical clustering algorithm were evaluated for their ability to select clusters that represent chemical series present in some typical examples of chemical compound data sets. The methods were evaluated using a Ward's clustering algorithm on subsets of the publicly available National Cancer Institute HIV data set, as well as with compounds from our corporate data set. We make a number of observations and recommendations about the choice of fingerprint type and cluster level selection methods for use in this type of clustering  相似文献   

17.
Three classes of arbitrary quantitative molecular similarity analysis (QMSA) methods have been computed using atom pairs, topological indices, and physicochemical properties. Tailored QMSA models have been developed using a selected number of TIs chosen by ridge regression. The methods have been applied to the K-nearest neighbor based estimation of log P of two sets of chemicals. Results show that the property-based and tailored QMSA methods are superior to the arbitrary similarity methods in estimating log P of both sets of chemicals  相似文献   

18.
The most desirable compound leads from high-throughput assays are those with novel biological activities resulting from their action on a single biological target. Valuable resources can be wasted on compound leads with significant 'side effects' on additional biological targets; therefore, technical refinements to identify compounds that primarily have effects resulting from a single target are needed. This study explores the use of multiple assays of a chemical library and a statistic based on entropy to identify lead compound classes that have patterns of assay activity resulting primarily from small molecule action on a single target. This statistic, called the coincidence score, discriminates with 88% accuracy compound classes known to act primarily on a single target from compound classes with significant side effects on nonhomologous targets. Furthermore, a significant number of the compound classes predicted to have primarily single-target effects contain known bioactive compounds. We also show that a compound's known biological target or mechanism of action can often be suggested by its pattern of activities in multiple assays.  相似文献   

19.
20.
Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein–protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号