首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The performance of the new probabilistic classification method CLASSY is evaluated on three different data sets, together with its predecessors SIMCA and ALLOC. The improvement made over ALLOC is only marginal, whereas CLASSY shows better predictive ability and greater reliability than SIMCA in most cases.  相似文献   

2.
Probabilistic classification (i.e., classification of individuals into one of several groups by assigning probabilities of classification to each individual) is desirable when the main interest is in individuals rather than the whole group. The evaluation of probabilistic assignments is described in detail, including statistical features such as measures for the sharpness of the classification, the predictive ability and the reliability of the probability values. In a simulation study, the influence of the objects—variable ratio and the interclass distance on the results was examined for the training data themselves (resubstitution method), an independent test set, and a pseudo-independent test set created from the training set (leave-one-out method). The results indicate that the leave-one-method can often be used instead of an independent test set. In many cases, the assignments cited as probabilities are not probabilities at all, because the classification system is too over-confident.  相似文献   

3.
The probabilistic SIMCA and CLASSY methods for multivariate classification are defined and explained in detail. The differences between the present algorithms and previous versions are described. Both probabilistic SIMCA and CLASSY methods construct principal-component class models and assume an ormal distribution for the residuals. The methods differ in the distributional assumptions for the object scores within the class model space. Details are given for the construction of probability density functions which conform to the model assumptions, and which can be substituted in Bayes' theorem to obtain posterior classification probabilities.  相似文献   

4.
The new probabilistic versions of SIMCA and CLASSY described in Part 1 are evaluated. Their classification performance is found to be generally better than those of the old versions. The results are also compared with those of the ALLOC and SLDA classification methods. General over-confident behaviour of the new SIMCA and CLASSY methods as well as ALLOC and SLDA is noted for two of the three data sets investigated (Iris and two wine data sets).  相似文献   

5.
The possibilities of action-orientated pattern recognition with the supervised pattern recognition technique, ALLOC, are discussed. The emphasis is on the importance of the definition of overlapping regions between classes as a way for obtaining more information about the separation between classes. Action-orientated classification and feature selection with ALLOC are discussed using the results obtained for two data bases concerning the characterization of the functional state of the thyroid and the determination of the origin of milk samples.  相似文献   

6.
One of the disadvantages of SIMCA pattern recognition is its inability to produce probabilistic classifications. Attempts to correct this involve distributional assumptions. It appears that SIMCA can handle the residual error terms efficiently, but that inside the class model subspace a crude truncation is used for determining a “normal range”, inside which all points are treated as equal. An improvement is made by applying kernel density estimation to the scores inside the class model subspace in combination with a normal error distribution in the remaining dimensions (CLASSY method). The evaluation of these probabilistic classification methods is discussed theoretically.  相似文献   

7.
The feature selection procedure of ALLOC is compared with the SELECT procedure in the ARTHUR software package and with a procedure based on statistical tests in the SPSS software package. Since ALLOC classification is very sensitive to redundant variables, feature selection is necessary. This is not a disadvantage because detection of redundant variables is always desirable. The ALLOC selection procedure performs very well in the two applications considered here, i.e., differentiation of milk samples and characterization of thyroid function.  相似文献   

8.
This paper on the application of potential functions in pattern recognition introduces the software package ALLOC to analytical chemistry, emphasizing the methodology of classifying objects. ALLOC is compared with other classification techniques on the basis of two data sets and is shown to perform very well.  相似文献   

9.
10.
We extend the Kohn-Sham potential energy expansion (VE) to include variations of the kinetic energy density and use the VE formulation with a 6-31G* basis to perform a "Jacob's ladder" comparison of small molecule properties using density functionals classified as being either LDA, GGA, or meta-GGA. We show that the VE reproduces standard Kohn-Sham DFT results well if all integrals are performed without further approximation, and there is no substantial improvement in using meta-GGA functionals relative to GGA functionals. The advantages of using GGA versus LDA functionals becomes apparent when modeling hydrogen bonds. We furthermore examine the effect of using integral approximations to compute the zeroth-order energy and first-order matrix elements, and the results suggest that the origin of the short-range repulsive potential within self-consistent charge density-functional tight-binding methods mainly arises from the approximations made to the first-order matrix elements.  相似文献   

11.
12.
In order to simplify the choice between different kinetic methods used in differential scanning calorimetry, an interesting way for testing kinetic treatments is proposed, using simulated thermoanalytical curves computed from given kinetic parameters. Applied to the study of a polymerization, we tested the Freeman-Carroll, Ellerstein, multiple linear regression (reaction-order model) and Achar-Brindley-Sharp methods. The test of the validity of the methods is performed using the LSM parameter that represents the fit between the mathematical treatment used in the kinetic model and known data. The study reveals the importance of the number of points used, i.e. the resolution, in the thermoanalytical curve recording, especially for the Freeman-Carroll and Ellerstein methods, there being an increase in the relative error on all the kinetic parameters when the number of points is decreased. Maximum relative errors are reported for the pre-exponential factor calculations. Evaluation of the enthalpy error on the determination of the kinetic parameters has been performed. Simulations obtained with various enthalpies indicate the necessity in such cases of computing a relative dimensionless LSM parameter (relative to the amplitude of the phenomena) in order to compare different thermal effects.  相似文献   

13.
The calculation of error bars for quantities of interest in computational chemistry comes in two forms: (1) Determining the confidence of a prediction, for instance of the property of a molecule; (2) Assessing uncertainty in measuring the difference between properties, for instance between performance metrics of two or more computational approaches. While a former paper in this series concentrated on the first of these, this second paper focuses on comparison, i.e. how do we calculate differences in methods in an accurate and statistically valid manner. Described within are classical statistical approaches for comparing widely used metrics such as enrichment, area under the curve and Pearson’s product-moment coefficient, as well as generic measures. These are considered of over single and multiple sets of data and for two or more methods that evince either independent or correlated behavior. General issues concerning significance testing and confidence limits from a Bayesian perspective are discussed, along with size-of-effect aspects of evaluation.  相似文献   

14.
This study compares results obtained with several chemometric methods: SIMCA, PLS2-DA, PLS2-DA with SIMCA, and PLS1-DA in two infrared spectroscopic applications. The results were optimized by selecting spectral ranges containing discriminant information. In the first application, mid-infrared spectra of crude petroleum oils were classified according to their geographical origins. In the second application, near-infrared spectra of French virgin olive oils were classified in five registered designations of origins (RDOs). The PLS-DA discrimination was better than SIMCA in classification performance for both applications. In both cases, the PLS1-DA classifications give 100% good results. The encountered difficulties with SIMCA analyses were explained by the criteria of spectral variance. As a matter of fact, when the ratio between inter-spectral variance and intra-spectral variance was close to the Fc (Fisher criterion) threshold, SIMCA analysis gave poor results. The discrimination power of the variable range selection procedure was estimated from the number of correctly classified samples.  相似文献   

15.
Ab initio HF, HF + MP2, LDA DFT, BLYP DFT, and B3LYP DFT calculations are compared in the case of 19 homopolypeptides in their β pleated sheet conformation. The results show that the B3LYP method provides good results for the fundamental gaps, as compared with the values estimated on the basis of available UV spectra and intermediate exciton calculations for PolyGly and PolyAla. The HF method gives the best agreement, using Koopman's theorem for the ionization potential, taking the calculated VBmax values in the HF case if one compares them with the experimental ionization potentials of the 19 amino acids measured by mass spectroscopy. Finally, how these methods might be improved to determine the most stable conformations of the homopolypeptides is outlined. © 2004 Wiley Periodicals, Inc. Int J Quantum Chem, 2004  相似文献   

16.
17.
18.
LC-NMR utilizing (1)H and (29)Si NMR spectroscopy is ideally suited for the analysis of silicones. It is shown that reversed phase gradient LC-NMR surpasses standard gel permeation chromatography (GPC) and diffusion ordered spectroscopy (DOSY) in the analysis of model hydride terminated polydimethylsiloxane. (1)H and (29)Si NMR in the stopped-flow arrangement leads to full identification of the components. Concentration gradient introduces a dependence of the (29)Si shifts on solvent composition, this dependence can be substantially reduced by a proposed method of referencing. It is shown that the ADEQUATE version of powerful but insensitive 2D INADEQUATE experiment can be used for complete line assignment.  相似文献   

19.
20.
Seventeen preprocessing methods have been applied to 524 low-resolution mass spectra of steroids before computing classifiers, which can recognize substructures in a steroid molecule. Best classification results have been obtained by normalization of peak height to local ion current (predictive abilities 85%) and with “significant” spectra that contain only the “most important” peaks (predictive abilities 84%).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号