首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
A robust method was developed to cluster similar NMR spectra from partially purified extracts obtained from a range of marine sponges and a plant biota. The NMR data were acquired using microtiter plate NMR (VAST) in protonated solvents. A sample data set which contained several clusters was used to optimize the protocol. The evaluation of the robustness was performed using three different clustering methods: tree clustering analysis, K-means clustering and multidimensional scaling. These methods were compared for consistency using the sample data set and the optimized methodology was applied to clustering of a set of spectra from partially purified biota extracts.  相似文献   

5.
研究不同贮存年限半夏药材的浸出物,建立浸出物的HPLC特征指纹图谱,为半夏药材品质评控提供参考。浸出物测定方法采用药典法;HPLC指纹图谱的色谱条件:采用C_(18)色谱柱(150 mm×4.6 mm,5μm),以水–甲醇为流动相,梯度洗脱,流量为0.8 m L/min,检测波长为260 nm,柱温为25℃,进样体积为50μL。采用相似度评价及聚类分析技术揭示14批样品的相似性及差异性。14批半夏浸出物有12批合格,2批不合格。建立14批半夏浸出物样品的高效液相指纹图谱,确定了3个共有峰,共有峰保留时间的相对标准偏差小于2%,峰面积的相对标准偏差差异较大。1~#~7~#半夏样品有12个共有峰,共有峰保留时间的相对标准偏差小于1.5%,峰面积的相对标准偏差差异较大。各批次药材化学成分组成及含量均存在一定差异。以半夏浸出物数据与其高效液相色谱指纹图谱数据为基础,将指纹图谱相似度评价与聚类分析结合起来,用浸出物含量及评价软件测评结果对半夏品质进行综合评估,可以更精确地对半夏药材进行质量控制。  相似文献   

6.
Multispectral images such as multispectral chemical images or multispectral satellite images provide detailed data with information in both the spatial and spectral domains. Many segmentation methods for multispectral images are based on a per-pixel classification, which uses only spectral information and ignores spatial information. A clustering algorithm based on both spectral and spatial information would produce better results.

In this work, spatial refinement clustering (SpaRef), a new clustering algorithm for multispectral images is presented. Spatial information is integrated with partitional and agglomeration clustering processes. The number of clusters is automatically identified. SpaRef is compared with a set of well-known clustering methods on compact airborne spectrographic imager (CASI) over an area in the Klompenwaard, The Netherlands. The clusters obtained show improved results. Applying SpaRef to multispectral chemical images would be a straight-forward step.  相似文献   


7.
Serial analysis of gene expression (SAGE) is a powerful tool to obtain gene expression profiles. Clustering analysis is a valuable technique for analyzing SAGE data. In this paper, we propose an adaptive clustering method for SAGE data analysis, namely, PoissonAPS. The method incorporates a novel clustering algorithm, Affinity Propagation (AP). While AP algorithm has demonstrated good performance on many different data sets, it also faces several limitations. PoissonAPS overcomes the limitations of AP using the clustering validation measure as a cost function of merging and splitting, and as a result, it can automatically cluster SAGE data without user-specified parameters. We evaluated PoissonAPS and compared its performance with other methods on several real life SAGE datasets. The experimental results show that PoissonAPS can produce meaningful and interpretable clusters for SAGE data.  相似文献   

8.
Accelerated K-means clustering in metric spaces   总被引:1,自引:0,他引:1  
The K-means method is a popular technique for clustering data into k-partitions. In the adaptive form of the algorithm, Lloyds method, an iterative procedure alternately assigns cluster membership based on a set of centroids and then redefines the centroids based on the computed cluster membership. The most time-consuming part of this algorithm is the determination of which points being clustered belong to which cluster center. This paper discusses the use of the vantage-point tree as a method of more quickly assigning cluster membership when the points being clustered belong to intrinsically low- and medium-dimensional metric spaces. Results will be discussed from simulated data sets and real-world data in the clustering of molecular databases based upon physicochemical properties. Comparisons will be made to a highly optimized brute-force implementation of Lloyd's method and to other pruning strategies.  相似文献   

9.
In the current study, multiwavelength detection combined with color scales HPTLC fingerprinting procedure and chemometric approach were applied for direct clustering of a set of medicinal plants with different geographical growing areas. The fingerprints profiles of the hydroalcoholic extracts obtained after single and double development and detection under 254 nm and 365 nm, before and after selective spraying with specific derivatization reagents were evaluated by chemometric approaches. Principal component analysis (PCA) with factor analysis (FA) methods were used to reveal the contribution of red (R), green (G), blue (B) and, respectively, gray (K) color scale fingerprints to HPTLC classification of the analyzed samples. Hierarchical cluster analysis (HCA) was used to classify the medicinal plants based on measure of similarity of color scale fingerprint patterns. The 1-Pearson distance measurement with Ward’s amalgamation procedure proved to be the most convenient approach for the correct clustering of samples. Data from color scale fingerprints obtained for double development procedure and multiple visualization modes combined with appropriate chemometric methods proved to detect the similar medicinal plant extracts even though they are from different geographical regions, have different storage conditions and no specific markers are individually extracted. This approach could be proposed as a promising tool for authentication and identification studies of plant materials based on HPTLC fingerprinting analysis.  相似文献   

10.
We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules.  相似文献   

11.
运用色谱指纹图谱与化学计量学方法对灵芝进行分类   总被引:2,自引:0,他引:2  
张景丽  罗霞  郑林用  许小燕  叶利明 《色谱》2009,27(6):776-780
采用95%乙醇为提取溶剂,运用高效液相色谱(HPLC)指纹图谱技术与化学计量学方法,对11个不同灵芝菌株子实体进行分类。通过相似度分析分别获得提取样品指纹图谱的13个共有峰及每个样品之间的相似度;以相对共有峰面积为分析参数,运用化学计量学方法包括聚类分析(HCA)、主成分分析(PCA)及判别分析(DA)对其进行分类,结果分为紫芝、赤芝和美国大灵芝3类。实验结果表明,用化学计量学的方法对灵芝样品的指纹图谱数据进行分析,是一种可用于其分类的科学方法。  相似文献   

12.
Summary The performance of neural networks in classifying mass spectral data is evaluated and compared to methods of multivariate data analysis and pattern recognition. Back propagation networks are matched with linear discriminant analysis, Kohonen feature maps are compared to the knearest neighbour clustering algorithm. Eight classifiers were trained, in order to discriminate mass spectra of steroids from eight distinct classes of chemical compounds. The results obtained show slightly better performance of Kohonen networks compared to k-nearest neighbour clustering and equal performance of multi-layer perceptrons and discriminant analysis.  相似文献   

13.
Multi-wavelength fingerprints of Cassia seed, a traditional Chinese medicine (TCM), were collected by high-performance liquid chromatography (HPLC) at two wavelengths with the use of diode array detection. The two data sets of chromatograms were combined by the data fusion-based method. This data set of fingerprints was compared separately with the two data sets collected at each of the two wavelengths. It was demonstrated with the use of principal component analysis (PCA), that multi-wavelength fingerprints provided a much improved representation of the differences in the samples. Thereafter, the multi-wavelength fingerprint data set was submitted for classification to a suite of chemometrics methods viz. fuzzy clustering (FC), SIMCA and the rank ordering MCDM PROMETHEE and GAIA. Each method highlighted different properties of the data matrix according to the fingerprints from different types of Cassia seeds. In general, the PROMETHEE and GAIA MCDM methods provided the most comprehensive information for matching and discrimination of the fingerprints, and appeared to be best suited for quality assurance purposes for these and similar types of sample.  相似文献   

14.
A set of 35 uranium ore and 10 yellow cake samples, collected worldwide from different mines and production sites, were analyzed for their impurity spectrum by ICP-MS. Pattern recognition techniques such as cluster analysis were applied to the data set in order to characterize samples with relation to their geographical origin. The results obtained show a clear relationship between samples taken from the same geological origin and constitute a satisfactory fingerprint for establishing the origin of the material. In addition to the impurity data, data on the isotopic composition of radiogenic lead is used to resolve ambiguity when impurity cluster analysis fails to deliver unambiguous origin data.  相似文献   

15.
As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace.org.  相似文献   

16.
The tremendous increase in chemical structure and biological activity data brought about through combinatorial chemistry and high-throughput screening technologies has created the need for sophisticated graphical tools for visualizing and exploring structure-activity data. Visualization plays an important role in exploring and understanding relationships within such multidimensional data sets. Many chemoinformatics software applications apply standard clustering techniques to organize structure-activity data, but they differ significantly in their approaches to visualizing clustered data. Molecular Property eXplorer (MPX) is unique in its presentation of clustered data in the form of heatmaps and tree-maps. MPX employs agglomerative hierarchical clustering to organize data on the basis of the similarity between 2D chemical structures or similarity across a predefined profile of biological assay values. Visualization of hierarchical clusters as tree-maps and heatmaps provides simultaneous representation of cluster members along with their associated assay values. Tree-maps convey both the spatial relationship among cluster members and the value of a single property (activity) associated with each member. Heatmaps provide visualization of the cluster members across an activity profile. Unlike a tree-map, however, a heatmap does not convey the spatial relationship between cluster members. MPX seamlessly integrates tree-maps and heatmaps to represent multidimensional structure-activity data in a visually intuitive manner. In addition, MPX provides tools for clustering data on the basis of chemical structure or activity profile, displaying 2D chemical structures, and querying the data based over a specified activity range, or set of chemical structure criteria (e.g., Tanimoto similarity, substructure match, and "R-group" analysis).  相似文献   

17.
18.
We discuss the clustering of 234 environmental samples resulting from an extensive monitoring program concerning soil lead content, plant lead content, traffic density, and distance from the road at different sampling locations in former East Germany. Considering the structure of data and the unsatisfactory results obtained applying classical clustering and principal component analysis, it appeared evident that fuzzy clustering could be one of the best solutions. In the following order we used different fuzzy clustering algorithms, namely, the fuzzy c-means (FCM) algorithm, the Gustafson–Kessel (GK) algorithm, which may detect clusters of ellipsoidal shapes in data by introducing an adaptive distance norm for each cluster, and the fuzzy c-varieties (FCV) algorithm, which was developed for recognition of r-dimensional linear varieties in high-dimensional data (lines, planes or hyperplanes). Fuzzy clustering with convex combination of point prototypes and different multidimensional linear prototypes is also discussed and applied for the first time in analytical chemistry (environmetrics). The results obtained in this study show the advantages of the FCV and GK algorithms over the FCM algorithm. The performance of each algorithm is illustrated by graphs and evaluated by the values of some conventional cluster validity indices. The values of the validity indices are in very good agreement with the quality of the clustering results. Figure Projection of all samples on the plane defined by the membership degrees to cluster A2, and A4 obtained using Fuzzy c-varieties (FCV) algorithm (expression of objective function and distance enclosed)  相似文献   

19.
This paper compares the performance of two clustering methods; DPClus graph clustering and hierarchical clustering to classify volatile organic compounds (VOCs) using fingerprint-based similarity measure between chemical structures. The clustering results from each method were compared to determine the degree of cluster overlap and how well it classified chemical structures of VOCs into clusters. Additionally, we also point out the advantages and limitations of both clustering methods. In conclusion, chemical similarity measure can be used to predict biological activities of a compound and this can be applied in the medical, pharmaceutical and agrotechnology fields.  相似文献   

20.
Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号