首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years numerous methods of pattern recognition have been tested for automatic interpretation of physicochemical data. Classifiers have been used successfully, especially with low resolution mass spectra. However, judgement of spectral classifiers (‘percentage of correctly classified spectra’) was often mathematically insufficiently defined. In this paper basic principles of the probability theory and information theory are used to derive objective criteria for binary classifiers. A classifier is an algorithm that uses a pattern vector (mass spectrum) and a priori probabilities for the classes (chemical structures) to which this vector belongs; the classification results are a posteriori probabilities for the classes. Predictive abilities for both classes or the information gain are suitable, objective criteria, to compare classifiers. Mathematical formulae are given and explained by examples from mass spectrometry.  相似文献   

2.
In this paper, we propose a reduced representation of molecules of pharmacological interest based on their chemical functions. The proposed representations of the molecules are obtained by a topological analysis of their electron density maps at medium resolution, leading to graphs of critical points. The distribution of the different types of critical points are compared at various levels of resolution for a training set of 22 molecules in order to define the optimal resolution level leading to the best representation of the various chemical functions. The reduced representations can in the future be used for molecular similarity research and pharmacophore proposals.  相似文献   

3.
An efficient program, which runs on a personal computer, for the storage, retrieval, and processing of chemical information, is presented, The program can work both as a stand-alone application or in conjunction with a specifically written Web server application or with some standard SQL servers, e.g., Oracle, Interbase, and MS SQL. New types of data fields are introduced, e.g., arrays for spectral information storage, HTML and database links, and user-defined functions. CheD has an open architecture; thus, custom data types, controls, and services may be added. A WWW server application for chemical data retrieval features an easy and user-friendly installation on Windows NT or 95 platforms.  相似文献   

4.
The last couple of years an overwhelming amount of data has emerged in the field of biomolecular structure determination. To explore information hidden in these structure databases, clustering techniques can be used. The outcome of the clustering experiments largely depends, among others, on the way the data is represented; therefore, the choice how to represent the molecular structure information is extremely important. This article describes what the influence of the different representations on the clustering is and how it can be analyzed by means of a dendrogram comparison method. All experiments are performed using a data set consisting of RNA trinucleotides. Besides the most basic structure representation, the Cartesian coordinates representation, several other structure representations are used.  相似文献   

5.
6.
We present a low rank moment expansion of the linear density‐density response function. The general interacting (fully nonlocal) density‐density response function is calculated by means of its spectral decomposition via an iterative Lanczos diagonalization technique within linear density functional perturbation theory. We derive a unitary transformation in the space of the eigenfunctions yielding subspaces with well‐defined moments. This transformation generates the irreducible representations of the density‐density response function with respect to rotations within SO(3). This allows to separate the contributions to the electronic response density from different multipole moments of the perturbation. Our representation maximally condenses the physically relevant information of the density‐density response function required for intermolecular interactions, yielding a considerable reduction in dimensionality. We illustrate the performance and accuracy of our scheme by computing the electronic response density of a water molecule to a complex interaction potential. © 2015 Wiley Periodicals, Inc.  相似文献   

7.
Optical simulations enable to model an entire chemical gas sensing platform based on hollow waveguides (HWGs) operating in the mid-infrared spectral regime using a three-dimensional representation of the sensor components and taking the spectral response to virtual analytes into account. Furthermore, a strategy for including the spectral response of dielectrically coated HWGs is demonstrated. Utilizing experimentally obtained spectroscopic data recorded at well-defined conditions, the complex refractive indices of selected target analytes (i.e., methane, butane, and isobutylene) have been derived based on a refined harmonic oscillator model. In turn, these parameters have enabled to directly assign the dielectric functions of these analytes to virtual objects representing the analyte within the modeled sensor setup. In a next step, spectroscopic sensor response functions have been simulated as absorbance spectra across selected wavelength regimes utilizing spectrally resolved ray-tracing techniques and have been compared to experimental data.  相似文献   

8.
Fourier transforms occur in a variety of chemical systems and processes. A few examples include obtaining spectral information from correlation functions, energy relaxation processes, spectral densities obtained from force autocorrelation functions, etc. In this article, a new functional transform, named the dual propagation inversion (DPI) is introduced. The DPI functional transform can be applied to a variety of problems in chemistry, such as Fourier transforms of time correlation functions, energy relaxation processes, rate theory, etc. The present illustrative application is to generating the frequency representation of a discrete, truncated time-domain signal. The DPI result is compared with the traditional Fourier transform applied to the same truncated time signal. For both noise-free and noise-corrupted time-truncated signals, the DPI spectrum is found to be more accurate, particularly as the signal is more severely truncated. In the DPI, the distributed-approximating-functional free propagator is used to propagate and denoise the signal simultaneously. Received: 30 January 2000 / Accepted: 6 July 2000 / Published online: 23 November 2000  相似文献   

9.
10.
Most traditional chromatographic separation criteria or response functions are defined on chromatograms recorded by single-channel detectors, e.g. a spectrometer measuring the absorbance at a single wavelength or a thermal conductivity detector. When the peaks are seriously overlapped, usually there is a lack of the information concerning the total number of chemical components, overlap degree of the peaks and peak purity. Such information characterizes some crucial aspects of separation process and lack of it will lead to an inaccurate and misleading evaluation of separation quality as well as some computational ambiguity for many traditional response functions. In contrast, hyphenated chromatography-(multi-channel) spectroscopy instruments together with chemometric methods will largely increase the information content available in chromatographic detection. Such information, if properly used, can cast a new light on evaluation of chromatographic separation quality. The main objective of this article is to review chemometric methods devoted to estimation of the number of chemical components, determination of elution sequence and assessment of peak purity. Some newly defined response functions or separation criteria based on extracted information by chemometric methods are also introduced. The methods reviewed are limited to those for treating two-way data obtained by hyphenation of high-performance liquid chromatography with multi-channel detectors. We prefer to provide a comprehensive view of such methods rather than present a full list of all the methods developed. Further details of some important methods are touched upon in favor of employment and understanding of them by researchers not very familiar with chemometrics.  相似文献   

11.
12.
Physical models of various phenomena are often represented by a mathematical model where the output(s) of interest have a multivariate dependence on the inputs. Frequently, the underlying laws governing this dependence are not known and one has to interpolate the mathematical model from a finite number of output samples. Multivariate approximation is normally viewed as suffering from the curse of dimensionality as the number of sample points needed to learn the function to a sufficient accuracy increases exponentially with the dimensionality of the function. However, the outputs of most physical systems are mathematically well behaved and the scarcity of the data is usually compensated for by additional assumptions on the function (i.e., imposition of smoothness conditions or confinement to a specific function space). High dimensional model representations (HDMR) are a particular family of representations where each term in the representation reflects the individual or cooperative contributions of the inputs upon the output. The main assumption of this paper is that for most well defined physical systems the output can be approximated by the sum of these hierarchical functions whose dimensionality is much smaller than the dimensionality of the output. This ansatz can dramatically reduce the sampling effort in representing the multivariate function. HDMR has a variety of applications where an efficient representation of multivariate functions arise with scarce data. The formulation of HDMR in this paper assumes that the data is randomly scattered throughout the domain of the output. Under these conditions and the assumptions underlying the HDMR it is argued that the number of samples needed for representation to a given tolerance is invariant to the dimensionality of the function, thereby providing for a very efficient means to perform high dimensional interpolation. Selected applications of HDMR's are presented from sensitivity analysis and time-series analysis.  相似文献   

13.
Reduction of infrared spectra representation by using fast Fourier and fast Hadamard transforms is discussed. It is shown that the information content for different truncations varies in the same way for both transformations, while the execution of FHT is about 8 times faster than that of FFT. The reduction of the information content is illustrated by comparing the hierarchical ordering of clusters of the same set of infrared spectra using representations of different lengths. The same basic pattern of three clusters was obtained even when the representations were reduced by more than 96% of the original representations.  相似文献   

14.
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.  相似文献   

15.
This paper supersedes previous theoretical approaches to conceptual DFT because it provides a unified and systematic approach to all of the commonly considered formulations of conceptual DFT, and even provides the essential mathematical framework for new formulations. Global, local, and nonlocal chemical reactivity indicators associated with the "closed-system representation" ([N(alpha),N(beta),nu(alpha)(r),nu(beta)(r)]) of spin-polarized density functional theory (SP-DFT) are derived. The links between these indicators and the ones associated with the "open-system representation" ([mu(alpha),mu(beta),nu(alpha)(r),nu(beta)(r)]) are derived, including the spin-resolved Berkowitz-Parr identity. The Legendre transform to the "density representation" ([rho(alpha)(r),rho(beta)(r)]) is performed, and the spin-resolved Harbola-Chattaraj-Cedillo-Parr identities linking the density representation to the closed-system and open-system representations are derived. Taken together, these results provide the framework for understanding chemical reactions from both the electron-following perspective (using either the closed-system or the open-system representation) and electron-preceding perspective (density representation). A powerful matrix-vector notation is developed; with this notation, identities in conceptual DFT become universal. Specifically, this notation allows the fundamental identities in conventional (spin-free) conceptual DFT, the [N(alpha),N(beta)] representation, and the [N=N(alpha)+N(beta),N(S)=N(alpha)-N(beta)] representation to be written in exactly the same forms. In cases where spin transfer and electron transfer are coupled (e.g., radical+molecule reactions), we believe that the [N(alpha),N(beta)] representation may be more useful than the more common [N,N(S)] representation.  相似文献   

16.
17.
Summary The concept of markaracter is proposed to discuss marks and characters for a group of finite order on a common basis. Thus, we consider a non-redundant set of dominant subgroups and a non-redundant set of dominant representations (SDR), where coset representations concerning cyclic subgroups are named dominant representations (DRs). The numbers of fixed points corresponding to each DR are collected to form a row vecter called a dominant markaracter (mark-character). Such dominant markaracters for the SDR are collected as a markaracter table. The markaracter table is related to a subdominant markaracter table of its subgroup so that the corresponding row of the former table is constructed from the latter. The data of the markaracter table are in turn used to construct a character table of the group, after each character is regarded as a markaracter and transformed into a multiplicity vector. The concept of orbit index is proposed to classify multiplicity vectors; thus, the orbit index of each DR is proved to be equal to one, while that corresonding to an irreducible representation is equal to zero.  相似文献   

18.
Since the very beginning of the discipline, chemometrics has mainly focussed on analytical chemical problems such as calibration. With the growing importance of databases and applications in medicinal and computational chemistry, the domains of analytical chemistry and chemometrics have been enlarged significantly in recent years. Especially the relation between molecular structure and function has become of considerable interest. Despite the huge quantities of data that are available nowadays, it is often difficult to recognise and extract relevant chemical information for the problem at hand. One of the main obstacles is the definition of an appropriate representation of a molecule. Although a variety of different representations are used, none are generally applicable.

This paper focuses on the challenges that arise in the chemometrical analysis of molecular structures, the relation between structure and function and the relation between molecular representation and chemometrical modelling. Exciting opportunities for further research are illustrated using an example concerning the prediction of co-crystallisation behaviour for small organic molecules with cephalosporin antibiotics.  相似文献   


19.
Real-world applications will inevitably entail divergence between samples on which chemometric classifiers are trained and the unknowns requiring classification. This has long been recognized, but there is a shortage of empirical studies on which classifiers perform best in ‘external validation’ (EV), where the unknown samples are subject to sources of variation relative to the population used to train the classifier. Survey of 286 classification studies in analytical chemistry found only 6.6% that stated elements of variance between training and test samples. Instead, most tested classifiers using hold-outs or resampling (usually cross-validation) from the same population used in training. The present study evaluated a wide range of classifiers on NMR and mass spectra of plant and food materials, from four projects with different data properties (e.g., different numbers and prevalence of classes) and classification objectives. Use of cross-validation was found to be optimistic relative to EV on samples of different provenance to the training set (e.g., different genotypes, different growth conditions, different seasons of crop harvest). For classifier evaluations across the diverse tasks, we used ranks-based non-parametric comparisons, and permutation-based significance tests. Although latent variable methods (e.g., PLSDA) were used in 64% of the surveyed papers, they were among the less successful classifiers in EV, and orthogonal signal correction was counterproductive. Instead, the best EV performances were obtained with machine learning schemes that coped with the high dimensionality (914–1898 features). Random forests confirmed their resilience to high dimensionality, as best overall performers on the full data, despite being used in only 4.5% of the surveyed papers. Most other machine learning classifiers were improved by a feature selection filter (ReliefF), but still did not out-perform random forests.  相似文献   

20.
In spectroscopy the measured spectra are typically plotted as a function of the wavelength (or wavenumber), but analysed with multivariate data analysis techniques (multiple linear regression (MLR), principal components regression (PCR), partial least squares (PLS)) which consider the spectrum as a set of m different variables. From a physical point of view it could be more informative to describe the spectrum as a function rather than as a set of points, hereby taking into account the physical background of the spectrum, being a sum of absorption peaks for the different chemical components, where the absorbance at two wavelengths close to each other is highly correlated. In a first part of this contribution, a motivating example for this functional approach is given. In a second part, the potential of functional data analysis is discussed in the field of chemometrics and compared to the ubiquitous PLS regression technique using two practical data sets. It is shown that for spectral data, the use of B-splines proves to be an appealing basis to accurately describe the data. By applying both functional data analysis and PLS on the data sets the predictive ability of functional data analysis is found to be comparable to that of PLS. Moreover, many chemometric datasets have some specific structure (e.g. replicate measurements, on the same object or objects that are grouped), but the structure is often removed before analysis (e.g. by averaging the replicates). In the second part of this contribution, we suggest a method to adapt traditional analysis of variance (ANOVA) methods to datasets with spectroscopic data. In particular, the possibilities to explore and interpret sources of variation, such as variations in sample and ambient temperature, are examined. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号