首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A new approach to the interpretation of spectra with “fuzzy sets” is described. A computer program CIF (Compound Identification with Fuzzy sets) is applied. This program is capable of finding components in a mixture by comparing the sample spectrum with reference spectra in a library. The applications discussed involve the interpretation of infrared spectra. The problems of spectral library search are discussed, an elementary introduction to fuzzy set theory is given, and applications to spectral library search are demonstrated.  相似文献   

2.
Summary Three-dimensional (3D)-database searches are now being widely applied to determine potential new active molecules. Many structural data sets obtained as a result of these searches are still large in size. In this paper we apply molecular similarity calculations as a rapid method to screen two such data sets. In the first investigation, synthetic candidates, produced as a result of a tendamistat -turn mimic search, were tested for their ability to imitate the -turn backbone. In the second study, structures extracted through a histamine pharmacophore query search were examined on the basis of their electronic similarity to histamine. Molecular similarity is shown to provide a rapid means of gaining insight into the composition of molecular data sets, with possible implications for future full 3D-database searches.  相似文献   

3.
4.
A brief survey is given of the most recent publications on development of artificial-intelligence systems for molecular spectral analysis. A new approach to solution of the problems of qualitative molecular spectral analysis is based on an applied logical calculus developed by the authors for fuzzy predicates. It is suggested that spectral-structural knowledge should be specified in the language of fuzzy predicates, and mechanical theorem-proving procedures used for solving qualitative problems of spectral analysis, the initial information being considered as a set of axioms. System-oriented matters are given consideration. The formalism suggested is a basis for the development of an artificial-intelligence dialogue system capable of solving various problems in molecular spectral analysis while maintaining a dialogue with a research worker using a professionally-restricted natural language.  相似文献   

5.
A fuzzy c-means clustering algorithm is presented which is much faster than the traditional algorithm for data sets in which the number of features is significantly larger than the number of feature vectors. The algorithm is constructed by utilizing the covariance structure of feature vectors and cluster centers. By using results from a previous clustering, modified versions of the new algorithm achieve additional reductions in floating point operations. © 1995 by John Wiley & Sons, Inc.  相似文献   

6.
Algorithms are described for correlating a proposed molecular structure with a mass spectrum. All molecular substructures of a proposed structure are determined which have the same masses as the fragment ions. The most likely fragment ion structures are those molecular substructures formed with the fewest number of bond cleavages in the proposed structure. The algorithms, which incorporate methods for handling rearrangement and adduct ions, utilize either nominal or exact data originating from any ionization method. The algorithms are demonstrated using the mass spectra of a substituted azetidinyl ketone and the macrolide antibiotic avermectin A1a.  相似文献   

7.
Producing good low‐dimensional representations of high‐dimensional data is a common and important task in many data mining applications. Two methods that have been particularly useful in this regard are multidimensional scaling and nonlinear mapping. These methods attempt to visualize a set of objects described by means of a dissimilarity or distance matrix on a low‐dimensional display plane in a way that preserves the proximities of the objects to whatever extent is possible. Unfortunately, most known algorithms are of quadratic order, and their use has been limited to relatively small data sets. We recently demonstrated that nonlinear maps derived from a small random sample of a large data set exhibit the same structure and characteristics as that of the entire collection, and that this structure can be easily extracted by a neural network, making possible the scaling of data set orders of magnitude larger than those accessible with conventional methodologies. Here, we present a variant of this algorithm based on local learning. The method employs a fuzzy clustering methodology to partition the data space into a set of Voronoi polyhedra, and uses a separate neural network to perform the nonlinear mapping within each cell. We find that this local approach offers a number of advantages, and produces maps that are virtually indistinguishable from those derived with conventional algorithms. These advantages are discussed using examples from the fields of combinatorial chemistry and optical character recognition. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 373–386, 2001  相似文献   

8.
Autoclaving was used to manipulate nutrient utilization and availability. The objectives of this study were to characterize any changes of the functional groups mainly associated with lipid structure in flaxseed (Linum usitatissimum, cv. Vimy), that occurred on a molecular level during the treatment process using infrared Fourier transform molecular spectroscopy. The parameters included lipid CH(3) asymmetric (ca. 2959 cm(-1)), CH(2) asymmetric (ca. 2928 cm(-1)), CH(3) symmetric (ca. 2871 cm(-1)) and CH(2) symmetric (ca. 2954 cm(-1)) functional groups, lipid carbonyl CO ester group (ca. 1745 cm(-1)), lipid unsaturation group (CH attached to CC) (ca. 3010 cm(-1)) as well as their ratios. Hierarchical cluster analysis (CLA) and principal components analysis (PCA) were conducted to identify molecular spectral differences. Flaxseed samples were kept raw for the control or autoclaved in batches at 120°C for 20, 40 or 60 min for treatments 1, 2 and 3, respectively. Molecular spectral analysis of lipid functional group ratios showed a significant decrease (P<0.05) in the CH(2) asymmetric to CH(3) asymmetric stretching band peak intensity ratios for the flaxseed. There were linear and quadratic effects (P<0.05) of the treatment time from 0, 20, 40 and 60 min on the ratios of the CH(2) asymmetric to CH(3) asymmetric stretching vibration intensity. Autoclaving had no significant effect (P>0.05) on lipid carbonyl CO ester group and lipid unsaturation group (CH attached to CC) (with average spectral peak area intensities of 138.3 and 68.8 IR intensity units, respectively). Multivariate molecular spectral analyses, CLA and PCA, were unable to make distinctions between the different treatment original spectra at the CH(3) and CH(2) asymmetric and symmetric region (ca. 2988-2790 cm(-1)). The results indicated that autoclaving had an impact to the mid-infrared molecular spectrum of flaxseed to identify heat-induced changes in lipid conformation. A future study is needed to quantify the relationship between lipid molecular structure changes and functionality/availability.  相似文献   

9.
The supervised principal components (SPC) method was proposed by Bair and Tibshirani for statistics regression problems where the number of variables greatly exceeds the number of samples. This case is extremely common in multivariate spectral analysis. The objective of this research is to apply SPC to near‐infrared and Raman spectral calibration. SPC is similar to traditional principal components analysis except that it selects the most significant part of wavelength from the high‐dimensional spectral data, which can reduce the risk of overfitting and the effect of collinearity in modeling according to a semi‐supervised strategy. In this study, four conventional regression methods, including principal component regression, partial least squares regression, ridge regression, and support vector regression, were compared with SPC. Three evaluation criteria, coefficient of determination (R2), external correlation coefficient (Q2), and root mean square error of prediction, were calculated to evaluate the performance of each algorithm on both near‐infrared and Raman datasets. The comparison results illustrated that the SPC model had a desirable ability of regression and prediction. We believe that this method might be an alternative method for multivariate spectral analysis. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
11.
Multvariate analysis of time-resolved pyrolysis/mass spectrometric data is described. The approach is based on the variance diagram (VARDIA), a recently developed technique that quantifies the clustering of variables in two-dimensional factor analysis (sub)-spaces in a rotational scanning procedure. A maximum in the VARDIA plot indicates a correlated behavior of the mass variables, indicating a common origin. This common origin is generally caused by a change in the concentration of a chemical component. With this information the “factor spectrum” and the scores of the component can be retrieved. For time-resolved serial data, consideration of the clustering behavior of the variables as a function of time is more appropriate than a rotational scanning procedure. Adaptation of the VARDIA for serial data, such as time-resolved data, is described. This approach has the advantage that all the factors can be used. It will be shown that the resolution of the obtained curve than the total ion current curve as a function of time. Examples will be given for time-resolved data of coal, rubber and wood samples.  相似文献   

12.
In recent years classifiers generated with kernel-based methods, such as support vector machines (SVM), Gaussian processes (GP), regularization networks (RN), and binary kernel discrimination (BKD) have been very popular in chemoinformatics data analysis. Aizerman et al. were the first to introduce the notion of employing kernel-based classifiers in the area of pattern recognition. Their original scheme, which they termed the potential function method (PFM), can basically be viewed as a kernel-based perceptron procedure and arguably subsumes the modern kernel-based algorithms. PFM can be computationally much cheaper than modern kernel-based classifiers; furthermore, PFM is far simpler conceptually and easier to implement than the SVM, GP, and RN algorithms. Unfortunately, unlike, e.g., SVM, GP, and RN, PFM is not endowed with both theoretical guarantees and practical strategies to safeguard it against generating overfitting classifiers. This is, in our opinion, the reason why this simple and elegant method has not been taken up in chemoinformatics. In this paper we empirically address this drawback: while maintaining its simplicity, we demonstrate that PFM combined with a simple regularization scheme may yield binary classifiers that can be, in practice, as efficient as classifiers obtained by employing state-of-the-art kernel-based methods. Using a realistic classification example, the augmented PFM was used to generate binary classifiers. Using a large chemical data set, the generalization ability of PFM classifiers were then compared with the prediction power of Laplacian-modified naive Bayesian (LmNB), Winnow (WN), and SVM classifiers.  相似文献   

13.
An instrumentation variation on laser-induced breakdown spectroscopy (LIBS) is described that allows simultaneous determination of all detectable elements using a multiple spectrograph and synchronized, multiple CCD spectral acquisition system. The system is particularly suited to the rapid analysis of heterogeneous materials such as coal and mineral ores. For the analysis of a heterogeneous material the acquisition cycle typically stores 1000 spectra for subsequent filtering and analysis. The incorporation of an effective data analysis methodology has been critical in achieving both accurate and reproducible results in the analysis of powders with the technology. Using naturally occurring gypsum as the optimization matrix, various data analysis techniques have been investigated including: using pulse-to-pulse internal standardisation; data filtering; and spectral deconvolution. The incorporation of normalization of the elemental emission to the total plasma emission intensity has been found to yield the single biggest improvement in accuracy and precision. Spectral deconvolution has been found to yield further improvement and is particularly relevant to the analysis of complex materials such as black coal. The use of pulse-to-pulse intensity normalization has the further benefit of extending the period between instrument recalibration, thus enhancing the ease of use of the device. The benefit of the optimized data analysis methodology is revealed in the determination of eight elemental components of gypsum (Na, Ca, Mg, Fe, Al, Si, Ti and K) where a typical absolute analysis accuracy of ±10% is obtained. These results compare favourably to analysis by conventional techniques for these materials. The analysis accuracy and repeatability is further demonstrated by the determination of the concentrations of these elements in a black coal sample.  相似文献   

14.
Airborne particulate matter is an important component of atmospheric pollution, affecting human health, climate, and visibility. Modern instruments allow single particles to be analyzed one-by-one in real time, and offer the promise of determining the sources of individual particles based on their mass spectral signatures. The large number of particles to be apportioned makes clustering a necessary step. The goal of this study is to compare using mass spectral data the accuracy and speed of several clustering algorithms: ART-2a, several variants of hierarchical clustering, and K-means. Repeated simulations with various algorithms and different levels of data preprocessing suggest that hierarchical clustering methods using derivatives of Ward's algorithm discriminate sources with fewer errors than ART-2a, which itself discriminates much better than point-wise hierarchical clustering methods. In most cases, K-means algorithms do almost as well as the best hierarchical clustering. These efficient algorithms (clustering derived from Ward's algorithm, ART-2a and K-means) are most accurate when the relative peak areas have been pre-scaled by taking the square root. Analysis times vary within a factor of 30, and when accuracy above 95% is required, run times scale up as the square of the number of particles. Algorithms derived from Ward's remain the most accurate under a wide range of conditions and conversely, for an equal accuracy, can deliver a shorter list of clusters, allowing faster and maybe on-the-fly classification.  相似文献   

15.
 Analysis of high-resolution NMR spectra elucidation has been known for many years. Hard-and software development now permits the implementation of such programs on personal computers. The structural information hidden in complex proton NMR spectra becomes easily accessible by using graphical user interfaces and direct data exchange between programs. A new mode has been implemented in 1D WIN-NMR to support the analysis of multiplet patterns with first order rules. Structure display, direct export mechanisms to the simulation program WIN-DAISY, and an archiving possibility complete the state-of-the-art data analysis. Some practical examples are given. Received: 25 October 1996/Revised: 6 March 1997/Accepted: 10 March 1997  相似文献   

16.
An evolving factor analysis procedure with concentration constraints (gradient concentration window) was applied to the analysis of data sets of aqueous Fourier transform infrared (FT-IR) spectra of carboxylic acids (acetic, malonic and succinic acids) collected in experiments with varying pH. Besides the calculation of the number of acid-base systems, this procedure allowed the calculation of the FT-IR spectra of the acid-base species present in equilibrium as well as the corresponding pK(a) values.  相似文献   

17.
18.
The choice of basis set in quantum chemical calculations can have a huge impact on the quality of the results, especially for correlated ab initio methods. This article provides an overview of the development of Gaussian basis sets for molecular calculations, with a focus on four popular families of modern atom‐centered, energy‐optimized bases: atomic natural orbital, correlation consistent, polarization consistent, and def2. The terminology used for describing basis sets is briefly covered, along with an overview of the auxiliary basis sets used in a number of integral approximation techniques and an outlook on possible future directions of basis set design. © 2012 Wiley Periodicals, Inc.  相似文献   

19.
Two traditional clustering algorithms are applied to configurations from a long molecular dynamics trajectory and compared using two sets of test data. First, a subset of atoms was chosen to present conformations which naturally fall into a number of clusters. Second, a subset of atoms was selected to span a relatively continuous region of conformational space rather than form discrete conformational classes. Of the two algorithms used, the single linkage method is inappropriate for this kind of data. The divisive hierarchical method, based on minimizing the difference between cluster centroids and extrema, is successful but also prone to imposing clustering hierarchy where none can be justified. © 1994 by John Wiley & Sons, Inc.  相似文献   

20.
The masses of ions observed in the mass spectrum of a pure compound are correlated with the masses of the molecular substructures of the compound. Three methods are described for generating molecular substructures. Each method is evaluated to establish how effectively it generates the molecular substructures and correlates the masses of the molecular substructures with the masses of the observed fragment ions. Rules for mass-spectral fragmentation processes are incorporated into the mass spectral analysis software and illustrated for retro-aldol and lactone-ester reactions occurring in the thermospray mass spectra of oligomycin antibiotics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号