首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
J.I. Villegas  G. Addová  T. Salmi 《Talanta》2007,72(4):1573-1580
Two SIMCA models were developed for the classification of acyclic octene isomers, which only form a fraction of a very complex product mixture obtained, for example, from the transformation of 1-butene. The effects of spectral transformation, namely autocorrelation and logarithmic intensity ratios transforms, and (square-root) scaling of the octane isomers mass-spectral data were investigated. Both the spectral-features preprocessing methods and scaling were found to be vital for an adequate development and improvement of the classification models. The best SIMCA models were successfully applied on gas-chromatography-mass spectroscopy (GC-MS) analysis collected from the dimerization of 1-butene over heterogeneous catalysts in the liquid phase.  相似文献   

2.
The performance of an inexpensive, inductive rule-building expert shell system, based on the ID3 algorithm, was compared to that of SIMCA class modeling in classifying the binary mass spectra of 78 toxic and related compounds. The compressed mass spectra consisted of 17 masses chosen by using information theory. The expert rules verified the six main classes and two subclasses found with SIMCA class modeling. These classes were: all benzenes and all alkanes/ alkenes (alka(e)nes); nonhalobenzenes, chlorobenzenes, bromoalka(e)nes, and chloroalka(e)nes; and mono-, dichloroalka(e)nes and polychloroalka(e)nes. Training set classification accuracies obtained with the expert system were 93–100% as opposed to 62–98% for SIMCA. For 73 compounds, the expert rules gave a classification accuracy of 97–100% vs. 79–96% for SIMCA. Predictive accuracy for the four main classes was 78%. In general, fewer masses were involved with the rules than with the SIMCA models, and the rules are normally optimized with regard to minimum number of steps in the rule, not minimum number of variables. The expert rules work best with closed sets of objects where all possibilities can be included in the training sets. The expert rules represent planes partitioning the multidimensional measurement space (hypercube) into a subvolume nearest the SIMCA cylinders for an appropriate class. Overall, the performance of the expert system was very good.  相似文献   

3.
With the aim of obtaining a monitoring tool to assess the quality of water, a multivariate statistical procedure based on cluster analysis (CA) coupled with soft independent modelling class analogy (SIMCA) algorithm, providing an effective classification method, is proposed. The experimental data set, carried out throughout the year 2004, was composed of analytical parameters from 68 water sources in a vast southwest area of Paris. Nine variables carrying the most useful information were selected and investigated (nitrate, sulphate, chloride, turbidity, conductivity, hardness, alkalinity, coliforms and Escherichia coli). Principal component analysis provided considerable data reduction, gathering in the first two principal components the majority of information representing about 92.2% of the total variance. CA grouped samples belonging to different sites, distinctly correlating them with chemical variables, and a classification model was built by SIMCA. This model was optimised and validated and then applied to a new data matrix, consisting of the parameters measured during the year 2005 from the same objects, providing a fast and accurate classification of all the samples. The most of the examined sources appeared unchanged during the 2-year period, but five sources resulted distributed in different classes, due to statistical significant changes of some characteristic analytical parameters.  相似文献   

4.
Multivariate classification methods are needed to assist in extracting information from analytical data. The most appropriate method for each problem must be chosen. The applicability of a method mainly depends on the distributional characteristics of the data population (normality, correlations between variables, separation of classes, nature of variables) and on the characteristics of the data sample available (numbers of objects, variables and classes, missing values, measurement errors). The CLAS program is designed to combine classification methods with evaluation of their performance, for batch data processing. It incorporates two-group linear discriminant analysis (SLDA), independent class modelling with principal components (SIMCA), kernel density estimation (ALLOC), and principal component class modelling with kernel density estimation (CLASSY). Most of these methods are implemented so as to give probabilistic classifications. Multiple linear regression is provided for, and other methods are scheduled. CLAS evaluates the classification method using the training set data (resubstitution), independent test data, and pseudo test data (leave-one-out method). This last method is optimized for faster computation. Criteria for classification performance and reliability of the given probabilities, etc. are determined. The package contains flexible possibilities for data manipulation, variable transformation and missing data handling.  相似文献   

5.
UNEQ, a method for supervised pattern recognition based on the assumption of multivariate normally-distributed groups, is presented. The method belongs to the group of so-called class-modelling techniques, i.e., classification functions are developed for each of the training classes separately, on the basis of the similarities between the objects within a group. New classes can therefore be entered easily into a classification problem. The method allows also easy detection of outliers. For each individual sample, the degree of connection with all the training classes can be defined. If for a given sample, this degree of class membership is low for all the classes, the object is considered as an outlier. The mathematical background of UNEQ is described. The validation of the derived classification functions in terms of sensitivity, specificity and efficiency is discussed. The method is illustrated and compared to SIMCA (another class-modelling technique) by means of a data set that concerns the classification of olive oils according to their area of origin, based on fatty acid patterns. It is concluded that the UNEQ method can be very useful for classification purposes but requires the populations to be homogeneous as is the case for other techniques. For the olive-oil data set, the performance of UNEQ is similar to or even better than SIMCA.  相似文献   

6.
MRM, multivariate range modeling, is based on models built as parallelepipeds in the space of the original variables and/or of discriminant variables as those of linear discriminant analysis. The ranges of these variables define the boundary of the model. The ranges are increased by a "tolerance" factor to take into account the uncertainty of their estimate. MRM is compared with UNEQ (the modeling technique based on the hypothesis of multivariate normal distribution) and with SIMCA (based on principal components) by means of the sensitivities and specificities of the models, the estimates of type I (sensitivity) and II error rates (specificity) evaluated both with the final model built with all the available objects and by means of cross validation. UNEQ and SIMCA models were obtained with the usual critical significance value of 5% and with the model forced to accept all the objects of the modeled category. The performance parameters of the class models are critically discussed focusing on their uncertainty.  相似文献   

7.
This work describes multi-classification based on binary probabilistic discriminant partial least squares (p-DPLS) models, developed with the strategy one-against-one and the principle of winner-takes-all. The multi-classification problem is split into binary classification problems with p-DPLS models. The results of these models are combined to obtain the final classification result. The classification criterion uses the specific characteristics of an object (position in the multivariate space and prediction uncertainty) to estimate the reliability of the classification, so that the object is assigned to the class with the highest reliability. This new methodology is tested with the well-known Iris data set and a data set of Italian olive oils. When compared with CART and SIMCA, the proposed method has better average performance of classification, besides giving a statistic that evaluates the reliability of classification. For the olive oil set the average percentage of correct classification for the training set was close to 84% with p-DPLS against 75% with CART and 100% with SIMCA, while for the test set the average was close to 94% with p-DPLS as against 50% with CART and 62% with SIMCA.  相似文献   

8.
采用傅里叶变换红外光谱(FTIR)结合簇类独立软模式识别技术(SIMCA)建立了真伪食用油的快速鉴别方法. 该方法依据FTIR 的指纹特性, 收集并分析了53 个合格食用油和13 个伪造食用油的FTIR 谱图; 通过对谱图取二阶导数和标准化处理, 主成分分析(PCA)提取特征变量; 采用SIMCA 方法分别随机选取43 个合格食用油和9 个伪食用油样品的FTIR 谱图组成训练集, 构建得到真伪食用油的SIMCA 分类模型. 该模型经过剩余10 个合格食用油和4 个伪食用油的验证, 正确识别率达到了100%. 说明FTIR 结合SIMCA 可能成为快速鉴别食用油真伪的一种新方法.  相似文献   

9.
ASTM clustering for improving coal analysis by near-infrared spectroscopy   总被引:1,自引:0,他引:1  
Andrés JM  Bona MT 《Talanta》2006,70(4):711-719
Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.  相似文献   

10.
The performance of the new probabilistic classification method CLASSY is evaluated on three different data sets, together with its predecessors SIMCA and ALLOC. The improvement made over ALLOC is only marginal, whereas CLASSY shows better predictive ability and greater reliability than SIMCA in most cases.  相似文献   

11.
NMR measurements coupled with pattern-recognition analysis offer a powerful mixture-analysis tool for latent-feature extraction and sample classification. As fundamental applications of this analysis for mixtures, the 1H spectra of 176 kinds of green, black, oolong and other tea infusions were acquired by a 500 MHz NMR spectrometer. Each spectrum pattern was analyzed by a multivariate statistical pattern-recognition method where Principal Component Analysis (PCA) was used in combination with Soft Independent Modeling of Class Analogy (SIMCA). SIMCA effectively selected variables that contribute to tea categorization. The final PCA resulted in clear classification reflecting the fermentation and processing of each tea, and revealed marker variables that include catechin and theanine peaks.  相似文献   

12.
Five different instrumental techniques: thermogravimetry, mid-infrared, near-infrared, ultra-violet and visible spectroscopies, have been used to characterize a high quality beer (Reale) from an Italian craft brewery (Birra del Borgo) and to differentiate it from other competing and lower quality products. Chemometric classification models were built on the separate blocks using soft independent modeling of class analogies (SIMCA) and partial least squares-discriminant analysis (PLS-DA) obtaining good predictive ability on an external test set (75% or higher depending on the technique). The use of data fusion strategies – in particular, the mid-level one – to integrate the data from the different platforms allowed the correct classification of all the training and validation samples.  相似文献   

13.
Sixteen samples of three types (classes) of brain tissue were characterized by capillary gas chromatography (g.c.). Each sample is thus characterized by the peak heights of 105 peaks in each g.c. profile. SIMCA pattern recognition is used to analyze the 16 × 105 data matrix in order to differentiate between the three classes on the basis of the g.c. data only. The SIMCA method is therefore applicable even when the number of variables (105) exceeds the number of objects (16). The results indicate that g.c. profiles are useful for the identification of brain tissue type.  相似文献   

14.
偏最小二乘法在红外光谱识别茶叶中的应用   总被引:1,自引:0,他引:1  
采用漫反射傅立叶变换红外光谱(FTIR)法结合主成分分析(PCA)、偏最小二乘法(PLS)、簇类的独立软模式(SIMCA)识别法对十三种茶叶进行了分类判别研究。研究结果表明,通过多元散射校正(MSC)对原始光谱进行预处理,可以提高模式识别技术的分类判别效果。在此基础上,选取1 900~900 cm-1波长范围内的茶叶红外光谱建立识别模型,三种方法都得到了满意的分类判别效果。在对检验集中全部130个样本的判别中,PCA仅有两类样本无法判别,SIMCA的识别率和拒绝率都在90%以上,而PLS的识别效果最佳,全部样本都得到了正确的归类。这一研究结果表明傅立叶变换红外光谱法与化学计量学方法相结合可以实现茶叶品种的快速鉴别,这为茶叶的客观评审提供了一种新思路。  相似文献   

15.
In this paper, the potential of coupling mid- and near-infrared spectroscopic fingerprinting techniques and chemometric classification methods for the traceability of extra virgin olive oil samples from the PDO Sabina was investigated. To this purpose, two different pattern recognition algorithm representative of the discriminant (PLS-DA) and modeling (SIMCA) approach to classification were employed. Results obtained after processing the spectroscopic data by PLS-DA evidenced a rather high classification accuracy, NIR providing better predictions than MIR (as evaluated both in cross-validation and on an external test set). SIMCA confirmed these results and showed how the category models for the class Sabina can be rather sensitive and highly specific. Lastly, as samples from two harvesting years (2009 and 2010) were investigated, it was possible to evidence that the different production year can have a relevant effect on the spectroscopic fingerprint. Notwithstanding this, it was still possible to build models that are transferable from one year to another with good accuracy.  相似文献   

16.
A total of 2400 samples of commercial Brazilian C gasoline were collected over a 6-month period from different gas stations in the S?o Paulo state, Brazil, and analysed with respect to 12 physicochemical parameters according to regulation 309 of the Brazilian Government Petroleum, Natural Gas and Biofuels Agency (ANP). The percentages (v/v) of hydrocarbons (olefins, aromatics and saturated) were also determined. Hierarchical cluster analysis (HCA) was employed to select 150 representative samples that exhibited least similarity on the basis of their physicochemical parameters and hydrocarbon compositions. The chromatographic profiles of the selected samples were measured by gas chromatography with flame ionisation detection and analysed using soft independent modelling of class analogy (SIMCA) method in order to create a classification scheme to identify conform gasolines according to ANP 309 regulation. Following the optimisation of the SIMCA algorithm, it was possible to classify correctly 96% of the commercial gasoline samples present in the training set of 100. In order to check the quality of the model, an external group of 50 gasoline samples (the prediction set) were analysed and the developed SIMCA model classified 94% of these correctly. The developed chemometric method is recommended for screening commercial gasoline quality and detection of potential adulteration.  相似文献   

17.
When infrared spectral data are used in classification and/or multivariate regression methods there can be problems related to both chemical understanding and computation speed due to the large number of wavenumbers in each spectrum. Here, it is shown that the Procrustes rotation technique can be used to select a minimum set of spectral variables (wavenumbers) to perform classification and regression. Procrustes rotation was coupled to several multivariate methods as PLS, SIMCA and potential curves (a maximum likelihood classification method). The practical problem of implementing a screening methodology for classifying apple juice-based beverages according to their contents of "pure" apple juice was addressed using attenuated total reflectance, mid-IR spectroscopy. It is found that two of the original wavenumbers are almost as good predictors as all the 176 initial ones.  相似文献   

18.
Hydrogen magnetic resonance spectroscopy (1H‐MRS) is a non‐invasive technique which provides a ‘frequency‐signal intensity’ spectrum of biochemical compounds of tissues in the body. Although this method is currently used in human brain studies, accurate classification of in‐vivo 1H‐MRS is a challenging task in the diagnosis of brain tumors. Problems such as overlapping metabolite peaks, incomplete information on background component and low signal‐to‐noise ratio disturb classification results of this spectroscopic method. This study presents an alternative approach to the soft independent modeling of class analogy (SIMCA) technique, using non‐negative matrix factorization (NMF) for dimensionality reduction. In the adopted strategy, the performance of SIMCA was improved by application of a robust algorithm for classification in the presence of noisy measurements. Total of 219 spectra from two databases were taken by water‐suppressed short echo‐time 1H‐MRS, acquired from different subjects with different stages of glial brain tumors (Grade II (26 cases), grade III (24 cases), grade IV (41 cases), as well as 25 healthy cases). The SIMCA was performed using two approaches: (i) principal component analysis (PCA) and (ii) non‐negative matrix factorization (NMF), as a modified approach. Square prediction error was considered to assess the class membership of the external validation set. Finally, several figures of merit such as the correct classification rate (CCR), sensitivity and specificity were calculated. Results of SIMCA based on NMF showed significant improvement in percentage of correctly classified samples, 91.4% versus 83.5% for PCA‐based model in an independent test set. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

19.
High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation‐assisted nearest shrunken centroid classifier (CA‐NSC) to incorporate correlation information into the classification. CA‐NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA‐NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA‐NSC consistently improves on NSC and SIMCA. The misclassification rate of CA‐NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号