首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary A 115 compound dataset for HSA binding is divided into the training set and the test set based on molecular similarity and cluster analyses. Both Kier–Hall valence connectivity indices and 4D-fingerprint similarity measures were applied to this dataset. Four different predictive schemes (SM, SA, SR, SC) were applied to the test set based on the similarity measures of each compound to the compounds in the training set. The first algorithmic scheme (SM) predicts the binding affinity of a test compound using only the most similar training set compound’s binding affinity. This scheme has relatively poor predictivity based both on Kier–Hall valence connectivity indices similarity measures and 4D-fingerprints similarity analyses. The other three algorithmic schemes (SM SR, SC), which assign a weighting coefficient to each of the top-ten most similar training set compounds, have reasonable predictivity of a test set. The algorithmic scheme which categorizes the most similar compounds into different weighted clusters predicts the test set best. The 4D-fingerprints provide 36 different individual IPE/IPE type molecular similarity measures. This study supports that some types of similarity measures are highly similar to one another for this dataset. Both the Kier–Hall valence connectivity indices similarity measures and the 4D-fingerprints have nearly same predictivity for this particular dataset.  相似文献   

2.
Hierarchical clustering algorithms such as Wards or complete-link are commonly used in compound selection and diversity analysis. Many such applications utilize binary representations of chemical structures, such as MACCS keys or Daylight fingerprints, and dissimilarity measures, such as the Euclidean or the Soergel measure. However, hierarchical clustering algorithms can generate ambiguous results owing to what is known in the cluster analysis literature as the ties in proximity problem, i.e., compounds or clusters of compounds that are equidistant from a compound or cluster in a given collection. Ambiguous ties can occur when clustering only a few hundred compounds, and the larger the number of compounds to be clustered, the greater the chance for significant ambiguity. Namely, as the number of "ties in proximity" increases relative to the total number of proximities, the possibility of ambiguity also increases. To ensure that there are no ambiguous ties, we show by a probabilistic argument that the number of compounds needs to be less than 2(n 1/4), where n is the total number of proximities, and the measure used to generate the proximities creates a uniform distribution without statistically preferred values. The common measures do not produce uniformly distributed proximities, but rather statistically preferred values that tend to increase the number of ties in proximity. Hence, the number of possible proximities and the distribution of statistically preferred values of a similarity measure, given a bit vector representation of a specific length, are directly related to the number of ties in proximities for a given data set. We explore the ties in proximity problem, using a number of chemical collections with varying degrees of diversity, given several common similarity measures and clustering algorithms. Our results are consistent with our probabilistic argument and show that this problem is significant for relatively small compound sets.  相似文献   

3.
A new method, using a combination of 4D-molecular similarity measures and cluster analysis to construct optimum QSAR models, is applied to a data set of 150 chemically diverse compounds to build optimum blood-brain barrier (BBB) penetration models. The complete data set is divided into subsets based on 4D-molecular similarity measures using cluster analysis. The compounds in each cluster subset are further divided into a training set and a test set. Predictive QASAR models are constructed for each cluster subset using the corresponding training sets. These QSAR models best predict test set compounds which are assigned to the same cluster subset, based on the 4D-molecular similarity measures, from which the models are derived. The results suggest that the specific properties governing blood-brain barrier permeability may vary across chemically diverse compounds. Partitioning compounds into chemically similar classes is essential to constructing predictive blood-brain barrier penetration models embedding the corresponding key physiochemical properties of a given chemical class.  相似文献   

4.
We carried out a mathematical study of 72 chemical elements taking advantage of the chemotopological method. We selected 128 properties to define the elements (physico-chemical, geochemical and chemical properties). Then, we looked for correlated properties and we reduced the number of them to 90. In this way we defined each element as a 90-tuple. Afterwards, we applied principal component analysis and cluster analysis (4 similarity functions and 5 grouping methodologies). Then, we calculated a consensus tree for the 20 dendrograms generated by the CA. Afterwards, we extracted the similarity relationships from the consensus tree and built up a basis for a topology on the set of chemical elements. Finally, we calculated some topological properties (closures, derived sets, boundaries, interiors and exteriors) of several subsets of chemical elements. We found that alkali metals, alkaline earth metals and noble gases appear not related to the rest of the elements. Also, we found that the boundary of non-metals are the semimetals with a stair-shape on the periodic table  相似文献   

5.
6.
This paper introduces new ideas for quantification of the similarity between chemical compounds. The method adopted makes use of similarity measures derived through comparison of two strings. The derived data on the similarity are then analyzed and applied in the identification of clusters in which the entities are more homogeneous and similar than those outside a cluster.  相似文献   

7.
This paper compares the performance of two clustering methods; DPClus graph clustering and hierarchical clustering to classify volatile organic compounds (VOCs) using fingerprint-based similarity measure between chemical structures. The clustering results from each method were compared to determine the degree of cluster overlap and how well it classified chemical structures of VOCs into clusters. Additionally, we also point out the advantages and limitations of both clustering methods. In conclusion, chemical similarity measure can be used to predict biological activities of a compound and this can be applied in the medical, pharmaceutical and agrotechnology fields.  相似文献   

8.
分子相似性和取代苯酚pKa值的预测   总被引:1,自引:0,他引:1  
  相似文献   

9.
This paper describes the general design and application of CerBeruS, a computer-based system for supporting the process of sequential screening. CerBeruS stands for cluster-based selection, with cluster analysis forming the pivotal part of the system. CerBeruS uses the Ward's clustering method for partitioning the data set to be screened into smaller, more homogeneous subsets. One representative is picked from each subset and suggested as a screening candidate. Although the number of compounds submitted to screening is most often driven by the capacity of the assay, CerBeruS provides a statistical measure that computes the optimal number of clusters in the data set. This measure forms a point of reference for all screening experiments. Different hierarchies of subsets are stored in an Oracle database. Information about the size and content of a cluster can be retrieved from this database via a Visual Basic application. How these components work together in the CerBeruS system is demonstrated on a large data set. In addition, we show that, using the statistical measure, one can find an optimal trade-off between screening effort and number of hits.  相似文献   

10.
Two simple linear notation systems are suggested to encode molecular structure including stereochemical elements. Both systems give rise to a unique numbering of the molecular graph, and thus also lead to a unique linear notation. Both linear notation systems are extremely compact and require only standard chemical symbols. A string comparison technique is developed to measure the similarity of two molecular linear notations. This procedure allows one to define a molecular similarity index with values that range from zero to unity, the zero value characterizing complete dissimilarity and the value of unity denoting identity. The notation and similarity index procedures are applied to several small molecular structures.  相似文献   

11.
Molecular quantum similarity is evaluated for enantiomers in the case of molecules showing conformational flexibility, using our earlier proposed Boltzmann weighted similarity index. The conformers of the enantiomers of the amino acids alanine, asparagine, cysteine, leucine, serine, and valine were examined. Next to studying global indices, the evaluation of local similarity is carried out using our earlier proposed local similarity index based on the Hirshfeld partitioning, to further illustrate Mezey's holographic electron density theorem in chiral systems and to quantify dissimilarity of enantiomers.  相似文献   

12.
Molecular quantum similarity is evaluated for enantiomers in the case of molecules possessing a chiral axis, as an extension of previous studies on molecules with a single asymmetric carbon atom. As a case study, the enantiomers of substituted allenes are examined. Next to studying global similarity, using the already existing similarity indices defined by Carbó and Hodgkin-Richards, we evaluate local similarity using our earlier proposed local similarity index based on the Hirshfeld partitioning, to quantify the consequences of Mezey's holographic electron density theorem in chiral systems. Furthermore, the relation between the optical activity and the dissimilarity is studied.  相似文献   

13.
14.
15.
16.
The tremendous increase in chemical structure and biological activity data brought about through combinatorial chemistry and high-throughput screening technologies has created the need for sophisticated graphical tools for visualizing and exploring structure-activity data. Visualization plays an important role in exploring and understanding relationships within such multidimensional data sets. Many chemoinformatics software applications apply standard clustering techniques to organize structure-activity data, but they differ significantly in their approaches to visualizing clustered data. Molecular Property eXplorer (MPX) is unique in its presentation of clustered data in the form of heatmaps and tree-maps. MPX employs agglomerative hierarchical clustering to organize data on the basis of the similarity between 2D chemical structures or similarity across a predefined profile of biological assay values. Visualization of hierarchical clusters as tree-maps and heatmaps provides simultaneous representation of cluster members along with their associated assay values. Tree-maps convey both the spatial relationship among cluster members and the value of a single property (activity) associated with each member. Heatmaps provide visualization of the cluster members across an activity profile. Unlike a tree-map, however, a heatmap does not convey the spatial relationship between cluster members. MPX seamlessly integrates tree-maps and heatmaps to represent multidimensional structure-activity data in a visually intuitive manner. In addition, MPX provides tools for clustering data on the basis of chemical structure or activity profile, displaying 2D chemical structures, and querying the data based over a specified activity range, or set of chemical structure criteria (e.g., Tanimoto similarity, substructure match, and "R-group" analysis).  相似文献   

17.
Various molecular similarity measures (overlap, Coulomb, kinetic, electrostatic energy) and similarity indices (Carbó, Hodgkin-Richards, Kulczynski, Shape Tanimoto) are applied to the superposition of 3D promolecular electron density (PED) distributions. The original aspect of the paper lies in the consideration of smoothed PEDs, which allow to decrease the number of local solutions to a superposition problem, together with the use of the less common kinetic and electrostatic energy similarity measures. Results are obtained for a family of five rigid endothiapepsin ligands that were already considered in previous applications, based on graph representations of their PED. In the present work, it is observed that the use of smoothed PED and the kinetic similarity measure, together with the Kulczynski or Shape Tanimoto index, performed the best to align molecules of different sizes.  相似文献   

18.
Four different two-dimensional fingerprint types (MACCS, Unity, BCI, and Daylight) and nine methods of selecting optimal cluster levels from the output of a hierarchical clustering algorithm were evaluated for their ability to select clusters that represent chemical series present in some typical examples of chemical compound data sets. The methods were evaluated using a Ward's clustering algorithm on subsets of the publicly available National Cancer Institute HIV data set, as well as with compounds from our corporate data set. We make a number of observations and recommendations about the choice of fingerprint type and cluster level selection methods for use in this type of clustering  相似文献   

19.
A simple manipulation of the Euclidian distance expression permits to obtain a scaled dissimilarity index measure, varying within a range of values lying in the interval [0,1]. Here is presented the theoretical background, its application to quantum similarity and use for artificial intelligence general purposes as well. The origin of Hodgkin-Richards index is analyzed and compared with quantum similarity Carbó index.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号