首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method is presented to facilitate the non-target analysis of data obtained in temperature-programmed comprehensive two-dimensional (2D) gas chromatography coupled to time-of-flight mass spectrometry (GC×GC-ToF-MS). One main difficulty of GC×GC data analysis is that each peak is usually modulated several times and therefore appears as a series of peaks (or peaklets) in the one-dimensionally recorded data. The proposed method, 2DAid, uses basic chromatographic laws to calculate the theoretical shape of a 2D peak (a cluster of peaklets originating from the same analyte) in order to define the area in which the peaklets of each individual compound can be expected to show up. Based on analyte-identity information obtained by means of mass spectral library searching, the individual peaklets are then combined into a single 2D peak. The method is applied, amongst others, to a complex mixture containing 362 analytes. It is demonstrated that the 2D peak shapes can be accurately predicted and that clustering and further processing can reduce the final peak list to a manageable size.  相似文献   

2.
Four different two-dimensional fingerprint types (MACCS, Unity, BCI, and Daylight) and nine methods of selecting optimal cluster levels from the output of a hierarchical clustering algorithm were evaluated for their ability to select clusters that represent chemical series present in some typical examples of chemical compound data sets. The methods were evaluated using a Ward's clustering algorithm on subsets of the publicly available National Cancer Institute HIV data set, as well as with compounds from our corporate data set. We make a number of observations and recommendations about the choice of fingerprint type and cluster level selection methods for use in this type of clustering  相似文献   

3.
Fourier transform infrared (FTIR) microspectroscopy has been employed to investigate benign (ordinary dermal and Reed nevi), dysplastic and malignant (invasive melanoma) skin lesions through the analysis of spectral changes of melanocytes as well as in the evaluation of the presence of melanin. Hierarchical cluster analysis and principal component analysis led to a satisfactory separation of malignant from dysplastic and normal melanocytes. Also, on enlarging the clustering with spectra from Reed and dermal nevi, the multivariate analysis segregated well the spectral data into discrete clusters, allowing the obtaining of reliable average spectra for analysis at the molecular level of the main groups or components responsible for the biological and biochemical changes. The most significant spectral characteristics appear to be related to differences in secondary protein structures, in nucleic acid conformation, in intra- and intermolecular bonding. In all cases, supervised and unsupervised spectral analyses resulted in satisfactory agreement with histopathological findings.  相似文献   

4.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

5.
The tremendous increase in chemical structure and biological activity data brought about through combinatorial chemistry and high-throughput screening technologies has created the need for sophisticated graphical tools for visualizing and exploring structure-activity data. Visualization plays an important role in exploring and understanding relationships within such multidimensional data sets. Many chemoinformatics software applications apply standard clustering techniques to organize structure-activity data, but they differ significantly in their approaches to visualizing clustered data. Molecular Property eXplorer (MPX) is unique in its presentation of clustered data in the form of heatmaps and tree-maps. MPX employs agglomerative hierarchical clustering to organize data on the basis of the similarity between 2D chemical structures or similarity across a predefined profile of biological assay values. Visualization of hierarchical clusters as tree-maps and heatmaps provides simultaneous representation of cluster members along with their associated assay values. Tree-maps convey both the spatial relationship among cluster members and the value of a single property (activity) associated with each member. Heatmaps provide visualization of the cluster members across an activity profile. Unlike a tree-map, however, a heatmap does not convey the spatial relationship between cluster members. MPX seamlessly integrates tree-maps and heatmaps to represent multidimensional structure-activity data in a visually intuitive manner. In addition, MPX provides tools for clustering data on the basis of chemical structure or activity profile, displaying 2D chemical structures, and querying the data based over a specified activity range, or set of chemical structure criteria (e.g., Tanimoto similarity, substructure match, and "R-group" analysis).  相似文献   

6.
Trianionic spin-quartet and tetraanionic spin-quintet molecular clusters derived from m-dibenzoylbenzene in solution were identified by CW-ESR/pulse-ESR based two-dimensional electron spin transient nutation spectroscopy, and their spin and clustering structures in the ground state were determined in terms of a D-tensor based phenomenological approach and DFT calculations. The molecular structures obtained semiempirically are supported by DFT-based quantum chemical calculations. The DFT calculations have been tested for a sodium ion bridged fluorenone-based cluster, [fluorenone(-)˙ {Na(+)(dme)(2)}](2), whose crystal structure was reported in the literature [H. Bock, H.-F. Herrmann, D. Fenske and H. Goesmann, Angew. Chem., Int. Ed. Engl., 1988, 27, 1067], reproducing the experimentally determined moelcular structure of the dimer cluster. It is suggested that both the quartet and quintet clusters in the 2-MTHF glass and solution form the cross-typed structures with the two m-dibenzoylbenzene moieties in cis-configuration. A dianionic spin-triplet m-dibenzoylbenzene derivative was detected for the first time and its charge and spin densities were studied by the quantum chemical calculations. The high-spin states of the open-shell entities under study were confirmed by X-band pulse-ESR based electron spin nutation spectroscopy in organic frozen glasses. The D values and other spin Hamiltonian parameters of all the polyanionic high-spin species were determined by the hybrid eigenfield spectral simulation for fine-structure ESR spectra. m-Dibenzoylbenzene provides pseudo-degenerate π-LUMOs arising from its topological symmetry of the π-electron network and its dianion in the triplet ground state is a prototypical model for topologically-controlled genuinely organic ferromagnetic metals.  相似文献   

7.
This paper compares the performance of two clustering methods; DPClus graph clustering and hierarchical clustering to classify volatile organic compounds (VOCs) using fingerprint-based similarity measure between chemical structures. The clustering results from each method were compared to determine the degree of cluster overlap and how well it classified chemical structures of VOCs into clusters. Additionally, we also point out the advantages and limitations of both clustering methods. In conclusion, chemical similarity measure can be used to predict biological activities of a compound and this can be applied in the medical, pharmaceutical and agrotechnology fields.  相似文献   

8.
This article presents a data analysis method for biomarker discovery in proteomics data analysis. In factor analysis-based discriminate models, the latent variables (LV's) are calculated from the response data measured at all employed instrument channels. Since some channels are irrelevant and their responses do not possess useful information, the extracted LV's possess mixed information from both useful and irrelevant channels. In this work, clustering of variables (CLoVA) based on unsupervised pattern recognition is suggested as an efficient method to identify the most informative spectral region and then it is used to construct a more predictive multivariate classification model. In the suggested method, the instrument channels (m/z value) are clustered into different clusters via self-organization map. Subsequently, the spectral data of each cluster are separately used as the input variables of classification methods such as partial least square-discriminate analysis (PLS-DA) and extended canonical variate analysis (ECVA). The proposed method is evaluated by the analysis of two experimental data sets (ovarian and prostate cancer data set). It is found that our proposed method is able to detect cancerous from healthy samples with much higher sensitivity and selectivity than conventional PLS-DA and ECVA methods.  相似文献   

9.
A new image analysis strategy is introduced to determine the composition and the structural characteristics of plant cell walls by combining Raman microspectroscopy and unsupervised data mining methods. The proposed method consists of three main steps: spectral preprocessing, spatial clustering of the image and finally estimation of spectral profiles of pure components and their weights. Point spectra of Raman maps of cell walls were preprocessed to remove noise and fluorescence contributions and compressed with PCA. Processed spectra were then subjected to k-means clustering to identify spatial segregations in the images. Cell wall images were reconstructed with cluster identities and each cluster was represented by the average spectrum of all the pixels in the cluster. Pure components spectra were estimated by spectral entropy minimization criteria with simulated annealing optimization. Two pure spectral estimates that represent lignin and carbohydrates were recovered and their spatial distributions were calculated. Our approach partitioned the cell walls into many sublayers, based on their composition, thus enabling composition analysis at subcellular levels. It also overcame the well known problem that native lignin spectra in lignocellulosics have high spectral overlap with contributions from cellulose and hemicelluloses, thus opening up new avenues for microanalyses of monolignol composition of native lignin and carbohydrates without chemical or mechanical extraction of the cell wall materials.  相似文献   

10.
Bone consists of an organic and an inorganic matrix. During development, bone undergoes changes in its composition and structure. In this study we apply three different cluster analysis algorithms [K-means (KM), fuzzy C-means (FCM) and hierarchical clustering (HCA)], and discriminant analysis (DA) on infrared spectroscopic data from developing cortical bone with the aim of comparing their ability to correctly classify the samples into different age groups. Cortical bone samples from the mid-diaphysis of the humerus of New Zealand white rabbits from three different maturation stages (newborn (NB), immature (11 days-1 month old), mature (3-6 months old)) were used. Three clusters were obtained by KM, FCM and HCA methods on different spectral regions (amide I, phosphate and carbonate). The newborn samples were well separated (71-100% correct classifications) from the other age groups by all bone components. The mature samples (3-6 months old) were well separated (100%) from those of other age groups by the carbonate spectral region, while by the phosphate and amide I regions some samples were assigned to another group (43-71% correct classifications). The greatest variance in the results for all algorithms was observed in the amide I region. In general, FCM clustering performed better than the other methods, and the overall error was lower. The discriminate analysis results showed that by combining the clustering results from all three spectral regions, the ability to predict the correct age group for all samples increased (from 29-86% to 77-91%). This study is the first to compare several clustering methods on infrared spectra of bone. Fuzzy C-means clustering performed best, and its ability to study the degree of memberships of samples to each cluster might be beneficial in future studies of medical diagnostics.  相似文献   

11.
Chemisorption of a methanol molecule onto a size-selected copper cluster ion, Cu(n)+ (n = 2-10), and subsequent reactions were investigated in a gas-beam geometry at a collision energy less than 2 eV in an apparatus based on a tandem-type mass spectrometer. Mass spectra of the product ions show that the following two reactions occur after chemisorption: dominant formation of Cu(n-1)+(H)(OH) (H(OH) formation) in the size range of 4-5 and that of Cu(n)O+ (demethanation) in the size range of 6-8 in addition to only chemisorption in the size range larger than 9. Absolute cross sections for the chemisorption, the H(OH) formation, and the demethanation processes were measured as functions of cluster size and collision energy. Optimized structures of bare copper cluster ions, reaction intermediates, and products were calculated by use of a hybrid method (B3LYP) consisting of the molecular orbital and the density functional methods. The origin of the size-dependent reactivity was explained as the structural change of cluster, two-dimensional to three-dimensional structures.  相似文献   

12.
A new method of imputation for left‐censored datasets is reported. This method is evaluated by examining datasets in which the true values of the censored data are known so that the quality of the imputation can be assessed both visually and by means of cluster analysis. Its performance in retaining certain data structures on imputation is compared with that of three other imputation algorithms by using cluster analysis on the imputed data. It is found that the new imputation method benefits a subsequent model‐based cluster analysis performed on the left‐censored data. The stochastic nature of the imputations performed in the new method can provide multiple imputed sets from the same incomplete data. The analysis of these provides an estimate of the uncertainty of the cluster analysis. Results from clustering suggest that the imputation is robust, with smaller uncertainty than that obtained from other multiple imputation methods applied to the same data. In addition, the use of the new method avoids problems with ill‐conditioning of group covariances during imputation as well as in the subsequent clustering based on expectation–maximization. The strong imputation performance of the proposed method on simulated datasets becomes more apparent as the groups in the mixture models are increasingly overlapped. Results from real datasets suggest that the best performance occurs when the requirement of normality of each group is fulfilled, which is the main assumption of the new method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

13.
Photosynthetic systems utilize hundreds of chlorophylls to collect sunlight and transport the energy to the reaction center with remarkably high quantum efficiency, however, the large size of the system together with the complex interactions among the components make it extremely challenging to understand the dynamics of light harvesting in large photosynthetic systems. To shed light on this problem, we present a structure-based theoretical framework that can be used to calculate transition rate matrix describing energy transport in photosynthetic systems and network clustering methods that provide simplified coarse-grained model revealing key structures guiding the light harvesting process. We constructed an effective model for energy transport in a Photosystem II supercomplex and applied several network clustering methods to generate coarse-grained kinetic cluster models for the system. Furthermore, we evaluated the performances of the network clustering methods, and show that a spectral clustering method and a minimum cut approach produce accurate coarse-grained models for the PSII-sc system. The results indicate that finding bottlenecks of energy transport is a crucial factor for reduced representations of photosynthetic light harvesting, and the overall work presented in this paper should provide a comprehensive theoretical framework to elucidate the dynamics of light harvesting in photosynthetic systems.  相似文献   

14.
Molecular dynamics (MD) is an essential tool for correlating collision cross-section data determined by ion mobility spectrometry (IMS) with candidate (calculated) structures. Conventional methods used for ion structure determination rely on comparing the measured cross-sections with the calculated collision cross-section for the lowest energy structure(s) taken from a large pool of candidate structures generated through multiple tiers of simulated annealing. We are developing methods to evaluate candidate structures from an ensemble of many conformations rather than the lowest energy structure. Here, we describe computational simulations and clustering methods to assign backbone conformations for singly-protonated ions of the model peptide (NH2-Met-Ile-Phe-Ala-Gly-Ile-Lys-COOH) formed by both MALDI and ESI, and compare the structures of MIFAGIK derivatives to test the ‘sensitivity’ of the cluster analysis method. Cluster analysis suggests that [MIFAGIK + H]+ ions formed by MALDI have a predominantly turn structure even though the low-energy ions prefer partial helical conformers. Although the ions formed by ESI have collision cross-sections that are different from those formed by MALDI, the results of cluster analysis indicate that the ions backbone structures are similar. Chemical modifications (N-acetyl, methylester as well as addition of Boc or Fmoc groups) to MIFAGIK alter the distribution of various conformers; the most dramatic changes are observed for the [M + Na]+ ion, which show a strong preference for random coil conformers owing to the strong solvation by the backbone amide groups.  相似文献   

15.
16.
The (1:1) clusters of 1,2,4,5-tetrafluorobenzene (TFB) with methanol and with 2,2,2-trifluoroethanol (TFE) were studied both experimentally and computationally. Through use of fluorescence-detected infrared spectroscopy, the (1:1) clusters were identified in supersonic jets. Intermolecular interactions in the clusters were characterized by the spectral shifts of the aromatic C-H stretching modes in the TFB moiety owing to the cluster formation. The molecular structures, stabilization energies, and vibrational frequencies of the clusters were computed at the MP2/6-31+G level. Both computational and experimental data indicate that an aromatic C-H...O hydrogen bond is present in the TFB-methanol cluster, while it is absent in the TFB-TFE cluster.  相似文献   

17.
This paper describes a novel clustering methodology for classifying over 700 conformations of a flexible analogue of GBR 12909, a dopamine reuptake inhibitor that has completed phase I clinical trials as a treatment for cocaine abuse. The major aspect of the clustering methodology includes an efficient data-conditioning scheme where a systematic feature extraction procedure based on the structural properties of the molecule was used to reduce the associated feature space. This allowed region-specific clustering that focused on individual pharmacophore elements of the molecule. For clustering of the reduced feature set, the fuzzy clustering partitional method was utilized. Due to the relational nature of the feature data, fuzzy relational clustering was employed, and it successfully detected natural groups defined by rotational minima around N(sp(3))-C(sp(3)), O(sp(3))-C(sp(3)), and C(sp(3))-C(sp(2)) bonds. The proposed clustering methodology also employed several cluster validity measures, which corroborated the partitions produced by the clustering technique and agreed with the results of hierarchical clustering using the XCluster program. Representative structures which exhibited a reasonable spread of energies and showed good spatial coverage of the conformational space were identified for use as putative bioactive conformations in a future Comparative Molecular Field Analysis of GBR 12909 analogues. The clustering methodology developed here is capable of handling other computational chemistry problems, and the feature extraction technique can be easily generalized to other molecules.  相似文献   

18.
Comprehensive two-dimensional gas chromatography coupled to mass spectrometry is a powerful tool to analyze complex samples. For application of the technique in studies like biomarker discovery in which large sets of complex samples have to be analyzed, extensive preprocessing is needed to align the data obtained in several injections (analyses). We developed new alignment and clustering algorithms for this type of data. New in the current procedures is the consistent way in which the phenomenon referred to as wrap-around is treated. The data analysis problems associated with this phenomenon are solved by treating the 2D display as the surface of a three-dimensional cylinder. Based on this transformation we developed a new similarity metric for features as a function of both the cylindrical distance (reflecting similarity in chromatographic behavior) and of the mass spectral correlation (reflecting similarity in chemical structure). The concepts are used in warping and clustering, and include a protection against greedy warping.  相似文献   

19.
Accelerated K-means clustering in metric spaces   总被引:1,自引:0,他引:1  
The K-means method is a popular technique for clustering data into k-partitions. In the adaptive form of the algorithm, Lloyds method, an iterative procedure alternately assigns cluster membership based on a set of centroids and then redefines the centroids based on the computed cluster membership. The most time-consuming part of this algorithm is the determination of which points being clustered belong to which cluster center. This paper discusses the use of the vantage-point tree as a method of more quickly assigning cluster membership when the points being clustered belong to intrinsically low- and medium-dimensional metric spaces. Results will be discussed from simulated data sets and real-world data in the clustering of molecular databases based upon physicochemical properties. Comparisons will be made to a highly optimized brute-force implementation of Lloyd's method and to other pruning strategies.  相似文献   

20.
A new compound, nigeglanine (1), and its new artificial derivative (1a), were isolated from the seeds of Nigella glandulifera, together with a known aporphine alkaloid, fuzitine (2). Their structures were established by spectral analysis, including two-dimensional (2D)-NMR spectroscopy. Nigeglanine (1) is the third natural product determined to contain an indazole nucleus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号