首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary Principal component projections of sets of mass spectra show clusters that contain compounds with common structure properties. The similarity of structures is investigated by an automatic search for large common substructures within the compounds of a cluster. Resulting spectra-structure-relationships are helpful in interpretation of spectra.  相似文献   

2.
Warmr: a data mining tool for chemical data   总被引:5,自引:0,他引:5  
  相似文献   

3.
In this paper we propose a new method based on measurements of the structural similarity for the clustering of chemical databases. The proposed method allows the dynamic adjustment of the size and number of cells or clusters in which the database is classified. Classification is carried out using measurements of structural similarity obtained from the matching of molecular graphs. The classification process is open to the use of different similarity indexes and different measurements of matching. This process consists of the projection of the obtained measures of similarity among the elements of the database in a new space of similarity. The possibility of the dynamic readjustment of the dimension and characteristic of the projection space to adapt to the most favorable conditions of the problem under study and the simplicity and computational efficiency make the proposed method appropriate for its use with medium and large databases. The clustering method increases the performance of the screening processes in chemical databases, facilitating the recovery of chemical compounds that share all or subsets of common substructures to a given pattern. For the realization of the work a database of 498 natural compounds with wide molecular diversity extracted from SPECS and BIOSPECS B.V. free database has been used.  相似文献   

4.
A controlled, Raman-monitored chemical reduction of a molybdate and vanadate mixture affords a new type of molybdenum-oxide-based cluster showing an unprecedented level of inorganic structural organization. The cluster incorporates two nanosized substructures (a ring and a sphere) in an open clam-like assembly. Multiple methods indicate that the nanoring contains delocalized electrons and the nanosphere contains localized but interacting electrons.  相似文献   

5.
We present a novel approach for enhancing the diversity of a chemical library rooted on the theory of the wisdom of crowds. Our approach was motivated by a desire to tap into the collective experience of our global medicinal chemistry community and involved four basic steps: (1) Candidate compounds for acquisition were screened using various structural and property filters in order to eliminate clearly nondrug-like matter. (2) The remaining compounds were clustered together with our in-house collection using a novel fingerprint-based clustering algorithm that emphasizes common substructures and works with millions of molecules. (3) Clusters populated exclusively by external compounds were identified as "diversity holes," and representative members of these clusters were presented to our global medicinal chemistry community, who were asked to specify which ones they liked, disliked, or were indifferent to using a simple point-and-click interface. (4) The resulting votes were used to rank the clusters from most to least desirable, and to prioritize which ones should be targeted for acquisition. Analysis of the voting results reveals interesting voter behaviors and distinct preferences for certain molecular property ranges that are fully consistent with lead-like profiles established through systematic analysis of large historical databases.  相似文献   

6.
We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules.  相似文献   

7.
8.
9.
10.
We introduce a geometric analysis of random sphere packings based on the ensemble averaging of hard-sphere clusters generated via local rules including a nonoverlap constraint for hard spheres. Our cluster ensemble analysis matches well with computer simulations and experimental data on random hard-sphere packing with respect to volume fractions and radial distribution functions. To model loose as well as dense sphere packings various ensemble averages are investigated, obtained by varying the generation rules for clusters. Essential findings are a lower bound on volume fraction for random loose packing that is surprisingly close to the freezing volume fraction for hard spheres and, for random close packing, the observation of an unexpected split peak in the distribution of volume fractions for the local configurations. Our ensemble analysis highlights the importance of collective and global effects in random sphere packings by comparing clusters generated via local rules to random sphere packings and clusters that include collective effects.  相似文献   

11.
A substructure isomorphism matrix n x p contains binary elements describing which of the given p query structures (substructures) are part of the given n target structures (molecular structures). Such a matrix can be used to investigate the diversity of the target structures and allows the characterization and comparison of structural libraries. A quadratic substructure isomorphism matrix n x n is obtained if the same structures are used as molecular structures and as substructures; this matrix contains full information about the topological hierarchy of the n structures. A hierarchical arrangement of chemical structures is useful for the evaluation of results obtained from searches in structure databases.  相似文献   

12.
Chemical fragment spaces are combinations of molecular fragments and connection rules. They offer the possibility to encode an enormously large number of chemical structures in a very compact format. Fragment spaces are useful both in similarity-based (2D) and structure-based (3D) de novo design applications. We present disconnection and filtering rules leading to several thousand unique, medium size fragments when applied to databases of druglike molecules. We evaluate alternative strategies to select subsets of these fragments, with the aim of maximizing the coverage of known druglike chemical space with a strongly reduced set of fragments. For these evaluations, we use the Ftrees fragment space method. We assess a diversity-oriented selection method based on maximum common substructures and a method biased toward high frequency of occurrence of fragments and find that they are complementary to each other.  相似文献   

13.
This paper presents a new methodology of chemical substructure recognition by interpretation of an infrared spectrum. The approach in spectrum interpretation is based on the determination of functional groups, which may be present or absent in compounds whose structure is unknown. The process of searching for spectrum-substructure correlation is realized by application of a statistical algorithm. In this method, correlations are generalized and condensed into a set of interpretation rules which are applied to the interpretation of an unknown compound's spectrum in order to predict whether the respective substructures are present or absent in the unknown molecule.  相似文献   

14.
In both physics and chemistry, increased attention is being paid to metal clusters. One reason for this attitude is furnished by the surprising results that have been obtained from studies of the preparation, structural characterization and physical and chemical properties of the clusters. Whereas investigations of cluster reactivity are at present generally limited to three- or four-membered clusters, successful syntheses of clusters with many more metal atoms have recently been designed. These substances occupy an intermediate position between solid state chemistry and the chemistry of metal complexes. This review presents a versatile method for synthesizing metal clusters: the reaction of complexes of transition metal halides with silylated compounds such as E(SiMe3)2 (E = S, Se, Te) and E′R(SiMe3)2 (R = Ph, Me, Et; E′ = P, As, Sb). Although some of the compounds thus formed have already been prepared by other routes, the method affords ready access to both small and large transition metal clusters with unusual structures and valence electron concentrations; a variety of reactions in the ligand sphere are also possible.  相似文献   

15.
The COSMO cluster-continuum (CCC) solvation model is introduced for the calculation of standard Gibbs solvation energies of protons. The solvation sphere of the proton is divided into an inner proton-solvent cluster with covalent interactions and an outer solvation sphere that interacts electrostatically with the cluster. Thus, the solvation of the proton is divided into two steps that are calculated separately: 1) The interaction of the proton with one or more solvent molecules is calculated in the gas phase with high-level quantum-chemical methods (modified G3 method). 2) The Gibbs solvation energy of the proton-solvent cluster is calculated by using the conductor-like screening model (COSMO). For every solvent, the solvation of the proton in at least two (and up to 11) proton-solvent clusters was calculated. The resulting Gibbs solvation energies of the proton were weighted by using Boltzmann statistics. The model was evaluated for the calculation of Gibbs solvation energies by using experimental data of water, MeCN, and DMSO as a reference. Allowing structural relaxation of the proton-solvent clusters and the use of structurally relaxed Gibbs solvation energies improved the accordance with experimental data especially for larger clusters. This variation is denoted as the relaxed COSMO cluster-continuum (rCCC) model, for which we estimate a 1σ error bar of 10 kJ mol(-1) . Gibbs solvation energies of protons in the following representative solvents were calculated: Water, acetonitrile, sulfur dioxide, dimethyl sulfoxide, benzene, diethyl ether, methylene chloride, 1,2-dichloroethane, sulfuric acid, fluorosulfonic acid, and hydrogen fluoride. The obtained values are absolute chemical standard potentials of the proton (pH=0 in this solvent). They are used to anchor the individual solvent specific acidity (pH) scales to our recently introduced absolute acidity scale.  相似文献   

16.
Abstract By means of clustering, one is able to manage large databases easily. Clustering according to structure similarity distinguished the several chemical classes that were present in our training set. All the clusters showed correlation of log WS with log K ( OW ) and melting point, except EINECS-cluster 1. This cluster contains only chemicals with melting points below room temperature, resulting in a log WS-log K( OW ), relationship. The observed weak correlation for this cluster is probably due to the insufficient number of available screens. Such a limited amount of screens allows relatively very different chemicals to share the same cluster. Using statistical criteria, our approach resulted in three QSARs with reasonably good predictive capabilities, originating from clusters 1639, 3472, and 5830. The models resulting from the smaller clusters 6873, 8154, and 16424 are characterised by high correlation coefficients which describe the cluster itself very well but, due to our stringent bootstrap criterion, they are close to randomness. Clusters 6815 and 18083 showed rather low correlations. The models originating from clusters 1639, 3472, and 5830 proved their usefulness by external validation. The log WS-values calculated with our QSARs agreed within 1 log-unit to these reported in the literature.  相似文献   

17.
Current ab initio structure‐prediction methods are sometimes able to generate families of folds, one of which is native, but are unable to single out the native one due to imperfections in the folding potentials and an inability to conduct thorough explorations of the conformational space. To address this issue, here we describe a method for the detection of statistically significant folds from a pool of predicted structures. Our approach consists of clustering and averaging the structures into representative fold families. Using a metric derived from the root‐mean‐square distance (RMSD) that is less sensitive to protein size, we determine whether the simulated structures are clustered in relation to a group of random structures. The clustering method searches for cluster centers and iteratively calculates the clusters and their respective centroids. The centroid interresidue distances are adjusted by minimizing a potential constructed from the corresponding average distances of the cluster structures. Application of this method to selected proteins shows that it can detect the best fold family that is closest to native, along with several other misfolded families. We also describe a method to obtain substructures. This is useful when the folding simulation fails to give a total topology prediction but produces common subelements among the structures. We have created a web server that clusters user submitted structures, which can be found at http://bioinformatics.danforthcenter.org/services/scar. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 339–353, 2001  相似文献   

18.
19.
We have developed a method, given a database of molecules and associated activities, to identify molecular substructures that are associated with many different biological activities. These may be therapeutic areas (e.g. antihypertensive) and/or mechanism-based activities (e.g. renin inhibitor). This information helps us avoid chemical classes that are likely to have unanticipated side effects and also can suggest combinatorial libraries that might have activity on a variety of receptor targets. The method was applied to the USPDI and MDDR databases. There are clearly substructures in each database that occur in many compounds and span a variety of therapeutic categories. Some of these are expected, but some are not.  相似文献   

20.
《结构化学》2020,39(7):1185-1193
Atomic clusters of subnanometer scale and variable chemical composition offer great opportunities for rational design of functional nanomaterials. Among them, cage clusters doped with endohedral atom are particularly interesting owing to their enhanced stability and highly tunable physical and chemical properties. In this perspective, first we give a brief overview of the history of doped cage clusters and introduce the home-developed comprehensive genetic algorithm(CGA) for structure prediction of clusters. Then, we show a few examples of magnetic clusters and subnanometer catalysts based on doped cage clusters, which are computationally revealed or designed by the CGA code. Finally, we give an outlook for some future directions of cluster science.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号