首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).  相似文献   

2.
With the aim of obtaining a monitoring tool to assess the quality of water, a multivariate statistical procedure based on cluster analysis (CA) coupled with soft independent modelling class analogy (SIMCA) algorithm, providing an effective classification method, is proposed. The experimental data set, carried out throughout the year 2004, was composed of analytical parameters from 68 water sources in a vast southwest area of Paris. Nine variables carrying the most useful information were selected and investigated (nitrate, sulphate, chloride, turbidity, conductivity, hardness, alkalinity, coliforms and Escherichia coli). Principal component analysis provided considerable data reduction, gathering in the first two principal components the majority of information representing about 92.2% of the total variance. CA grouped samples belonging to different sites, distinctly correlating them with chemical variables, and a classification model was built by SIMCA. This model was optimised and validated and then applied to a new data matrix, consisting of the parameters measured during the year 2005 from the same objects, providing a fast and accurate classification of all the samples. The most of the examined sources appeared unchanged during the 2-year period, but five sources resulted distributed in different classes, due to statistical significant changes of some characteristic analytical parameters.  相似文献   

3.
The objective of this work was to apply artificial neural networks (ANNs) to the classification group of 43 derivatives of phenylcarbamic acid. To find the appropriate clusters Kohonen topological maps were employed. As input data, thermal parameters obtained during DSC and TG analysis were used. Input feature selection (IFS) algorithms were used in order to give an estimate of the relative importance of various input variables. Additionally, sensitivity analysis was carried out to eliminate less important thermal variables. As a result, one classification model was obtained, which can assign our compounds to an appropriate class. Because the classes contain groups of molecules structurally related, it is possible to predict the structure of the compounds (for example the position of the substitution alkoxy group in the phenyl ring) on the basis of obtained parameters.  相似文献   

4.
This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.

An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.

Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.

There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables.  相似文献   


5.
The origin of medieval glass artefacts is studied by using a supervised learning technique, which is shown to be helpful when samples cannot be identified by typical design and appearance. A set of seventy pieces of glass was analyzed for ten trace elements by optical emission spectrography. The data matrix of 33 known objects from five origins was evaluated by multivariate variance and discriminant analysis in a training step. The extracted non-elementary discriminant functions were used to classify the 37 unidentified samples. The classification result is discussed in terms of its cultural/historical information content.  相似文献   

6.
A unique feature of the work carried out in the Collaborative Research Center 3MET continues to be its emphasis on innovative, advanced experimental methods which hyphenate mass-selection with further analytical tools such as laser spectroscopy for the study of isolated molecular ions. This allows to probe the intrinsic properties of the species of interest free of perturbing solvent or matrix effects. This review explains these methods and uses examples from past and ongoing 3MET studies of specific classes of multicenter metal complexes to illustrate how coordination chemistry can be advanced by applying them. As a corollary, we will show how the challenges involved in providing well-defined, for example monoisomeric, samples of the molecular ions have helped to further improve the methods themselves thus also making them applicable to many other areas of chemistry.  相似文献   

7.
In this tutorial, we focus on validation both from a numerical and conceptual point of view. The often applied reported procedure in the literature of (repeatedly) dividing a dataset randomly into a calibration and test set must be applied with care. It can only be justified when there is no systematic stratification of the objects that will affect the validated estimates or figures of merits such as RMSE or R2. The various levels of validation may, typically, be repeatability, reproducibility, and instrument and raw material variation. Examples of how one data set can be validated across this background information illustrate that it will affect the figures of merits as well as the dimensionality of the models. Even more important is the robustness of the models for predicting future samples. Another aspect that is brought to attention is validation in terms of the overall conclusions when observing a specific system. One example is to apply several methods for finding the significant variables and see if there is a consensus subset that also matches what is reported in the literature or based on the underlying chemistry.  相似文献   

8.
Sinkov NA  Harynuk JJ 《Talanta》2011,83(4):1079-1087
A novel metric termed cluster resolution is presented. This metric compares the separation of clusters of data points while simultaneously considering the shapes of the clusters and their relative orientations. Using cluster resolution in conjunction with an objective variable ranking metric allows for fully automated feature selection for the construction of chemometric models. The metric is based upon considering the maximum size of confidence ellipses around clusters of points representing different classes of objects that can be constructed without any overlap of the ellipses. For demonstration purposes we utilized PCA to classify samples of gasoline based upon their octane rating. The entire GC-MS chromatogram of each sample comprising over 2 × 106 variables was considered. As an example, automated ranking by ANOVA was applied followed by a forward selection approach to choose variables for inclusion. This approach can be generally applied to feature selection for a variety of applications and represents a significant step towards the development of fully automated, objective construction of chemometric models.  相似文献   

9.
Chemical and physical analyses of malt, the main ingredient of beer, have been used to predict the concentration of certain volatile compounds in the finished beer.The prediction was done by means of the partial least squares regression (PLS2) in SIMCA. The total data set as well as individual malt clusters were submitted to PLS analysis. Best prediction was obtained by separating the total object matrix in classes according to similarity found by fuzzy pattern recognition (FCV). FCV was also used to separate the beer variables in classes and to select the subset of variables to be predicted.A joint approach of fuzzy pattern recognition to identify groups of samples and SIMCA-PLS2 to predict several dependent variables is suggested as a powerful tool in process-analytical chemistry.  相似文献   

10.
UNEQ, a method for supervised pattern recognition based on the assumption of multivariate normally-distributed groups, is presented. The method belongs to the group of so-called class-modelling techniques, i.e., classification functions are developed for each of the training classes separately, on the basis of the similarities between the objects within a group. New classes can therefore be entered easily into a classification problem. The method allows also easy detection of outliers. For each individual sample, the degree of connection with all the training classes can be defined. If for a given sample, this degree of class membership is low for all the classes, the object is considered as an outlier. The mathematical background of UNEQ is described. The validation of the derived classification functions in terms of sensitivity, specificity and efficiency is discussed. The method is illustrated and compared to SIMCA (another class-modelling technique) by means of a data set that concerns the classification of olive oils according to their area of origin, based on fatty acid patterns. It is concluded that the UNEQ method can be very useful for classification purposes but requires the populations to be homogeneous as is the case for other techniques. For the olive-oil data set, the performance of UNEQ is similar to or even better than SIMCA.  相似文献   

11.
Direct analysis of solid samples employing a laboratory assembled electrothermal atomic absorption spectrometer is demonstrated to be a feasible approach for determination of trace elements in plant tissue and hair samples for special applications in plant physiology and biomedical research. As an example, the kinetics of Cr uptake by cabbage and its distribution have been measured as a function of chromium speciation in the nutrient solution. Further, longitudinal concentration gradients of Cr, Pb and Cd have been measured in hair of various population groups exposed to different levels of these elements in ambient and/or occupational environments. The techniques are validated for the determination of these trace elements by neutron activation analysis, dissolution atomic absorption spectrometry and by analysis of certified reference materials. Slurry sample introduction is found appropriate for routine trace element determination and in homogeneity testing. Direct sample introduction is indispensable in the analysis of very small (< 1 mg) tissue biopsy samples in the determination of trace element distributions.  相似文献   

12.
13.
Chemistry is classically concerned with the connection of atoms and molecules into new functional units. The rules of connection are yet to be extended to the generation and connection of larger objects, whose dimensions are measured in nanometers. However, linking objects of this size through molecules approaching each other randomly is inefficient, instead the principle of self-assembly is decisive, in which lyotropic structure formation or amphiphilic interaction play a significant role. As a result of the nature of the energetic driving forces, the objects generated in this way are often well-defined aggregate structures or highly symmetric volume phases. In contrast to “molecular chemistry”, the linking of larger objects also disregards the inherent borders of classical fields of chemistry: for example, the nanoscale association of inorganic colloids with polymers affords hybrid materials that combine the physical properties of both partners. In such a way, catalytic, optical, and electronic features of inorganic colloids might be combined with the mechanical characteristics of polymers such as film formation, elasticity, and melt processibility.  相似文献   

14.
Multivariate classification methods are needed to assist in extracting information from analytical data. The most appropriate method for each problem must be chosen. The applicability of a method mainly depends on the distributional characteristics of the data population (normality, correlations between variables, separation of classes, nature of variables) and on the characteristics of the data sample available (numbers of objects, variables and classes, missing values, measurement errors). The CLAS program is designed to combine classification methods with evaluation of their performance, for batch data processing. It incorporates two-group linear discriminant analysis (SLDA), independent class modelling with principal components (SIMCA), kernel density estimation (ALLOC), and principal component class modelling with kernel density estimation (CLASSY). Most of these methods are implemented so as to give probabilistic classifications. Multiple linear regression is provided for, and other methods are scheduled. CLAS evaluates the classification method using the training set data (resubstitution), independent test data, and pseudo test data (leave-one-out method). This last method is optimized for faster computation. Criteria for classification performance and reliability of the given probabilities, etc. are determined. The package contains flexible possibilities for data manipulation, variable transformation and missing data handling.  相似文献   

15.
A simple radioisotope X-ray fluorescence method of 1000 s irradiation of the samples by a 109Cd source combined with principal component analysis is described for determining the relative mass fractions of trace elements in majolica ceramics for provenance classification. Six provenances from Europe using 29 samples as standards and 12 unknown samples were investigated and characterized using selected trace elements as the variables. The unknown samples were previously assigned, but not definitively, by stylistic analysis and/or thermoluminescence measurements to the provenances of Teruel (Spain) and Holland. Because of a moderate fluorescence time, only the four net peak intensities of Pb, Rb, Sr and Zr could be used as variables. We have also studied the effect of not including the Pb variable, since the clay matrixes could have been contaminated in the glazing process or when the Pb-Sn enamel was removed. It is shown in both cases that the results were more consistent with the stylistic analysis and thermoluminescence measurements when the Pb concentration variable was not considered. A comparison of principal component analysis employing the three elements was similar to plotting of the relative mass fractions on a triangle graph.  相似文献   

16.
Artificial Neural Networks (ANNs) have seen an explosion of interest over the last two decades and have been successfully applied in all fields of chemistry and particularly in analytical chemistry. Inspired from biological systems and originated from the perceptron, i.e. a program unit that learns concepts, ANNs are capable of gradual learning over time and modelling extremely complex functions. In addition to the traditional multivariate chemometric techniques, ANNs are often applied for prediction, clustering, classification, modelling of a property, process control, procedural optimisation and/or regression of the obtained data. This paper aims at presenting the most common network architectures such as Multi-layer Perceptrons (MLPs), Radial Basis Function (RBF) and Kohonen's self-organisations maps (SOM). Moreover, back-propagation (BP), the most widespread algorithm used today and its modifications, such as quick-propagation (QP) and Delta-bar-Delta, are also discussed. All architectures correlate input variables to output variables through non-linear, weighted, parameterised functions, called neurons. In addition, various training algorithms have been developed in order to minimise the prediction error made by the network. The applications of ANNs in water analysis and water quality assessment are also reviewed. Most of the ANNs works are focused on modelling and parameters prediction. In the case of water quality assessment, extended predictive models are constructed and optimised, while variables correlation and significance is usually estimated in the framework of the predictive or classifier models. On the contrary, ANNs models are not frequently used for clustering/classification purposes, although they seem to be an effective tool. ANNs proved to be a powerful, yet often complementary, tool for water quality assessment, prediction and classification.  相似文献   

17.
The mathematical and statistical evaluation of environmental data gains an increasing importance in environmental chemistry as the data sets become more complex. It is inarguable that different mathematical and statistical methods should be applied in order to compare results and to enhance the possible interpretation of the data. Very often several aspects have to be considered simultaneously, for example, several chemicals entailing a data matrix with objects (rows) and variables (columns). In this paper a data set is given concerning the pollution of 58 regions in the state of Baden-Württemberg, Germany, which are polluted with metals lead, cadmium, zinc, and with sulfur. For pragmatic reasons the evaluation is performed with the dichotomized data matrix. First this dichotomized 58 x 13 data matrix is evaluated by the Hasse diagram technique, a multicriteria evaluation method which has its scientific origin in Discrete Mathematics. Then the Partially Ordered Scalogram Analysis with Coordinates (POSAC) method is applied. It reduces the data matrix in plotting it in a two-dimensional space. A small given percentage of information is lost in this method. Important priority objects, like maximal and minimal objects (high and low polluted regions), can easily be detected by Hasse diagram technique and POSAC. Two variables attained exceptional importance by the data analysis shown here: TLS, Sulfur found in Tree Layer, is difficult to interpret and needs further investigations, whereas LRPB, Lead in Lumbricus Rubellus, seems to be a satisfying result because the earthworm is commonly discussed in the ecotoxicological literature as a specific and highly sensitive bioindicator.  相似文献   

18.
It is shown that activation analysis is especially suited to serve as a basis for determining the chemical similarity between samples defined by their trace element concentration patterns. The general problem of classification and identification is discussed. The nature of possible classification structures and their approriate clustering strategies is considered. A practical computer method is suggested and its application as well as the graphical representation of classification results are given. The possibility for classification using information theory is mentioned. Classification of chemical elements is discussed and practically realized after Hadamard transformation of the concerntration variation patterns in a series of samples.  相似文献   

19.
多变量判别分析用于癌症诊断研究   总被引:9,自引:2,他引:9  
用感应耦合等离子体原子发射光谱及石墨炉原子吸收光谱仪测定了正常人及癌症病人头发样品中15种元素的含量。所得数据用多元多项式扩展增维和逐步回归变量压缩技术以及PLS方法处理,得到了病人与正常人分类极为清晰的二维判别图。据此可将头发用作癌症临床诊断中的分析样品以取代血液样品。  相似文献   

20.
Sixteen samples of three types (classes) of brain tissue were characterized by capillary gas chromatography (g.c.). Each sample is thus characterized by the peak heights of 105 peaks in each g.c. profile. SIMCA pattern recognition is used to analyze the 16 × 105 data matrix in order to differentiate between the three classes on the basis of the g.c. data only. The SIMCA method is therefore applicable even when the number of variables (105) exceeds the number of objects (16). The results indicate that g.c. profiles are useful for the identification of brain tissue type.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号