首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
RNA structure comparison is a fundamental problem in structural biology, structural chemistry, and bioinformatics. It can be used for analysis of RNA energy landscapes, conformational switches, and facilitating RNA structure prediction. The purpose of our integrated tool RNACluster is twofold: to provide a platform for computing and comparison of different distances between RNA secondary structures, and to perform cluster identification to derive useful information of RNA structure ensembles, using a minimum spanning tree (MST) based clustering algorithm. RNACluster employs a cluster identification approach based on a MST representation of the RNA ensemble data and currently supports six distance measures between RNA secondary structures. RNACluster provides a user-friendly graphical interface to allow a user to compare different structural distances, analyze the structure ensembles, and visualize predicted structural clusters.  相似文献   

2.
This paper compares the performance of two clustering methods; DPClus graph clustering and hierarchical clustering to classify volatile organic compounds (VOCs) using fingerprint-based similarity measure between chemical structures. The clustering results from each method were compared to determine the degree of cluster overlap and how well it classified chemical structures of VOCs into clusters. Additionally, we also point out the advantages and limitations of both clustering methods. In conclusion, chemical similarity measure can be used to predict biological activities of a compound and this can be applied in the medical, pharmaceutical and agrotechnology fields.  相似文献   

3.
4.
5.
The agglomerative clustering methods and the tests usually applied to evaluate the significance of clusters are critically evaluated. Many clustering techniques can provide erroneous information about the existence of clusters. The single linkage technique is suggested to identify natural, well separated, clusters. The existing statistical tests on the significance of clusters are not satisfactory. A new statistical test, based on the distribution of the distances between the objects and their first nearest neighbor, is presented. The performances of the test are compared with those of the Sneath test and of the variance-ratio test on some artificial and real data sets.  相似文献   

6.
Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires computer memory and time proportional to N(2) for N sequences. For small N or say up to 10000 or so, this can be accomplished in reasonable times for sequences of moderate length. For very large N, however, this becomes increasingly prohibitive. In this paper, we have tested variations on a class of published embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignments. Source code is available on request from the authors.  相似文献   

7.
We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules.  相似文献   

8.
A hierarchical clustering algorithm--NIPALSTREE--was developed that is able to analyze large data sets in high-dimensional space. The result can be displayed as a dendrogram. At each tree level the algorithm projects a data set via principle component analysis onto one dimension. The data set is sorted according to this one dimension and split at the median position. To avoid distortion of clusters at the median position, the algorithm identifies a potentially more suited split point left or right of the median. The procedure is recursively applied on the resulting subsets until the maximal distance between cluster members exceeds a user-defined threshold. The approach was validated in a retrospective screening study for angiotensin converting enzyme (ACE) inhibitors. The resulting clusters were assessed for their purity and enrichment in actives belonging to this ligand class. Enrichment was observed in individual branches of the dendrogram. In further retrospective virtual screening studies employing the MDL Drug Data Report (MDDR), COBRA, and the SPECS catalog, NIPALSTREE was compared with the hierarchical k-means clustering approach. Results show that both algorithms can be used in the context of virtual screening. Intersecting the result lists obtained with both algorithms improved enrichment factors while losing only few chemotypes.  相似文献   

9.
Haplotype reconstruction, based on aligned single nucleotide polymorphism (SNP) fragments, is to infer a pair of haplotypes from localized polymorphism data gathered through short genome fragment assembly. This paper first presents two distance functions, which are used to measure the difference degree and similarity degree between SNP fragments. Based on the two distance functions, a clustering algorithm is proposed in order to solve MEC model. The algorithm involves two sections. One is to determine the initial haplotype pair, the other concerns with inferring true haplotype pair by re-clustering. The comparison results prove that our algorithm utilizing two distance functions is effective and feasible.  相似文献   

10.
We describe a method for locating clusters of geometrically similar conformers in ensembles of chemical conformations. We first calculate the pairwise interconformational distance matrix in either torsional or Cartesian space and then use an agglomerative, single-link clustering method to define a hierarchy of clusterings in the same space. Especially good clusterings are distinguished by high values of the separation ratio: the ratio of the shortest intercluster distance to the characteristic threshold distance defining the clustering. We also discuss other statistics. The method has been embodied in a program called XCluster, which can display the distance matrix, the hierarchy of clusterings, and the clustering statistics in a variety of formats. XCluster can also write out the clustered conformations for subsequent or simultaneous viewing with a molecular visualization program. We demonstrate the sorts of insight that this approach affords with examples obtained from conformational search and molecular dynamics procedures. © 1994 by John Wiley & Sons, Inc.  相似文献   

11.
Density-based spatial clustering of applications with noise (DBSCAN) is an unsupervised classification algorithm which has been widely used in many areas with its simplicity and its ability to deal with hidden clusters of different sizes and shapes and with noise. However, the computational issue of the distance table and the non-stability in detecting the boundaries of adjacent clusters limit the application of the original algorithm to large datasets such as images. In this paper, the DBSCAN algorithm was revised and improved for image clustering and segmentation. The proposed clustering algorithm presents two major advantages over the original one. Firstly, the revised DBSCAN algorithm made it applicable for large 3D image dataset (often with millions of pixels) by using the coordinate system of the image data. Secondly, the revised algorithm solved the non-stability issue of boundary detection in the original DBSCAN. For broader applications, the image dataset can be ordinary 3D images or in general, it can also be a classification result of other type of image data e.g. a multivariate image.  相似文献   

12.
We report a benchmark calculation for the fuzzy c-means clustering algorithm that can be used as a reference in theoretical and practical studies related to classification methodologies. A full exploration of the hard-initialization space is done for all possible different groupings on a simple fifteen-pattern system to describe their stationary points. Numerical problems associated with the stopping criteria are discussed in relation with the calculation of some validity indexes. All necessary information to assure an easy reproduction of the obtained results is clearly reported.  相似文献   

13.
Is There a “Most Chiral Tetrahedron”?   总被引:1,自引:0,他引:1  
A degree of chirality is a function that purports to measure the amount of chirality of an object: it is equal for enantiomers, vanishes only for achiral or degenerate objects and is similarity invariant, dimensionless and normalisable to the interval [0,1]. For a tetrahedron of non-zero three-dimensional volume, achirality is synonymous with the presence of a mirror plane containing one edge and bisecting its opposite, and hence it is easy to design degree-of-chirality functions based on edge length that incorporate all constraints. It is shown that such functions can have largest maxima at widely different points in the tetrahedral shape space, and by incorporation of appropriate factors, the maxima can be pushed to any point in the space. Thus the phrase "most chiral tetrahedron" has no general meaning: any chiral tetrahedron is the most chiral for some legitimate choice of degree of chirality.  相似文献   

14.
In this paper, the performance of new clustering methods such as Neural Gas (NG) and Growing Neural Gas (GNG) is compared with the K-means method for real and simulated data sets. Moreover, a new algorithm called growing K-means, GK, is introduced as the alternative to Neural Gas and Growing Neural Gas. It has small input requirements and is conceptually very simple. The GK leads to nearly optimal values of the cost function, and, contrary to K-means, it is independent of the initial data set partition. The incremental property of GK additionally helps to estimate the number of "natural" clusters in data, i.e., the well-separated groups of objects in the data space.  相似文献   

15.
16.
We propose a new similarity measure operating in the space spanned by the potential values, evaluated at atoms constituting the benzene ring and the COOH group in para-substituted benzoic acids and at benzene ring atoms in monosubstituted benzenes. The similarity measures are equivalent to the Euclidean distance between points in that space. Only the distances between the potentials at corresponding atoms in different molecules are included. The distances for benzene rings were very similar, regardless of whether they were calculated in para-substituted acids or in monosubstituted benzenes. As reference reactions, dissociation of benzoic acids and nitration of monosubstituted benzenes have been used. The effects of reduction of dimensionality of the potential space on the comparison of similarity measures with the free energies of the reference reactions have been investigated. It became obvious that the potentials at individual atoms in molecules of the acids and monosubstituted benzenes are mutually correlated to a high degree.  相似文献   

17.
This work describes the application of a Bayesian method for clustering protein conformations sampled during a molecular dynamics simulation of the HIV-1 integrase catalytic core. A clustering analysis is carried out under the assumption of normal distribution without fixing the number of clusters in advance. Some performance measures, such as posterior probability and class cross entropy, are used to determine the most probable set of clusters. The Bayesian clustering method results in meaningful groups identifying transitions between conformational ensembles. The dihedral angles involved in such transitions are also examined in detail. The conformations in high dimensional space are projected into 3D space employing a multidimensional scaling technique to provide a visual inspection.  相似文献   

18.
Pharmacophore modeling of large, drug-like molecules, such as the dopamine reuptake inhibitor GBR 12909, is complicated by their flexibility. A comprehensive hierarchical clustering study of two GBR 12909 analogs was performed to identify representative conformers for input to three-dimensional quantitative structure–activity relationship studies of closely-related analogs. Two data sets of more than 700 conformers each produced by random search conformational analysis of a piperazine and a piperidine GBR 12909 analog were studied. Several clustering studies were carried out based on different feature sets that include the important pharmacophore elements. The distance maps, the plot of the effective number of clusters versus actual number of clusters, and the novel derived clustering statistic, percentage change in the effective number of clusters, were shown to be useful in determining the appropriate clustering level.Six clusters were chosen for each analog, each representing a different region of the torsional angle space that determines the relative orientation of the pharmacophore elements. Conformers of each cluster that are representative of these regions were identified and compared for each analog. This study illustrates the utility of using hierarchical clustering for the classification of conformers of highly flexible molecules in terms of the three-dimensional spatial orientation of key pharmacophore elements.  相似文献   

19.
In this article, the interdisciplinary science of clusters is discussed in general terms. Different types of clusters across vast scales of matter, energy, space, and time in the physical world are discussed. Specific examples of clusters in chemistry and physics are used to illustrate various principles or models of clustering processes of atoms and molecules as well as to demonstrate the exquisite beauty and pattern of clusters and the clustering phenomena so ubiquitous in nature. Nowadays, “designer clusters” can be made with tailorable properties and used as “building blocks” to form supermolecules, or to construct large cluster-based hierarchical materials with tunable properties, or to fabricate cluster-based devices with specific functions, etc., thereby providing a materials base for nanotechnology. Clustering is a spontaneous self-assembly process and the similarity across scales reflects the intrinsic self-organization and self-similarity principle of the physical world. Geometry and symmetry transcend all clustering processes, in ordered as well as in disordered systems.  相似文献   

20.
Euclidean geometry and information and fuzzy-set theory are used to develop general criteria for the evaluation of clustering methods. A separation function, describing the geometric clustering in a feature space for a given separation state, is introduced. Suitable clustering algorithms for given data can be selected by using the measure derived. The criteria developed are used in studies of the homogeneity of solids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号