期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Classification of high-speed gas chromatography-mass spectrometry data by principal component analysis coupled with piecewise alignment and feature selection

Watson NE Vanwingerden MM Pierce KM Wright BW Synovec RE 《Journal of chromatography. A》2006,1129(1):111-118

A useful methodology is introduced for the analysis of data obtained via gas chromatography with mass spectrometry (GC-MS) utilizing a complete mass spectrum at each retention time interval in which a mass spectrum was collected. Principal component analysis (PCA) with preprocessing by both piecewise retention time alignment and analysis of variance (ANOVA) feature selection is applied to all mass channels collected. The methodology involves concatenating all concurrently measured individual m/z chromatograms from m/z 20 to 120 for each GC-MS separation into a row vector. All of the sample row vectors are incorporated into a matrix where each row is a sample vector. This matrix is piecewise aligned and reduced by ANOVA feature selection. Application of the preprocessing steps (retention time alignment and feature selection) to all mass channels collected during the chromatographic separation allows considerably more selective chemical information to be incorporated in the PCA classification, and is the primary novelty of the report. This methodology is objective and requires no knowledge of the specific analytes of interest, as in selective ion monitoring (SIM), and does not restrict the mass spectral data used, as in both SIM and total ion current (TIC) methods. Significantly, the methodology allows for the classification of data with low resolution in the chromatographic dimension because of the added selectivity from the complete mass spectral dimension. This allows for the successful classification of data over significantly decreased chromatographic separation times, since high-speed separations can be employed. The methodology is demonstrated through the analysis of a set of four differing gasoline samples that serve as model complex samples. For comparison, the gasoline samples are analyzed by GC-MS over both 10-min and 10-s separation times. The successfully classified 10-min GC-MS TIC data served as the benchmark analysis to compare to the 10-s data. When only alignment and feature selection was applied to the 10-s gasoline separations using GC-MS TIC data, PCA failed. PCA was successful for 10-s gasoline separations when the methodology was applied with all the m/z information. With ANOVA feature selection, chromatographic regions with Fisher ratios greater than 1500 were retained in a new matrix and subjected to PCA yielding successful classification for the 10-s separations. 相似文献

2.

Optimisation of a new headspace mass spectrometry instrument. Discrimination of different geographical origin olive oils 总被引：1，自引：0，他引：1

Cerrato Oliveros C Boggia R Casale M Armanino C Forina M 《Journal of chromatography. A》2005,1076(1-2):7-15

A fast head-space analysis instrument, constituted by an automatic sample introduction system directly coupled to a mass detector without performing any chromatographic separation, was assembled. A suitable and original response was computed to optimise, by experimental design, the measured signals for discrimination purposes. The volatile fractions of 105 extra virgin olive oils coming from five different Mediterranean areas were analysed. The rough information collected by this system was unravelled and explained by well-known chemometrical techniques of display (principal component analysis), feature selection (stepwise linear discriminant analysis) and classification (linear discriminant analysis). The 93.4% of samples resulted to be correctly classified and the 90.5% correctly predicted by cross-validation procedure, whilst the 80.0% of an external test set, created to full validate the classification rule, were correctly assigned. 相似文献

3.

Screening Brazilian C gasoline quality: application of the SIMCA chemometric method to gas chromatographic data

Flumignan DL Tininis AG Ferreira Fde O de Oliveira JE 《Analytica chimica acta》2007,595(1-2):128-135

A total of 2400 samples of commercial Brazilian C gasoline were collected over a 6-month period from different gas stations in the S?o Paulo state, Brazil, and analysed with respect to 12 physicochemical parameters according to regulation 309 of the Brazilian Government Petroleum, Natural Gas and Biofuels Agency (ANP). The percentages (v/v) of hydrocarbons (olefins, aromatics and saturated) were also determined. Hierarchical cluster analysis (HCA) was employed to select 150 representative samples that exhibited least similarity on the basis of their physicochemical parameters and hydrocarbon compositions. The chromatographic profiles of the selected samples were measured by gas chromatography with flame ionisation detection and analysed using soft independent modelling of class analogy (SIMCA) method in order to create a classification scheme to identify conform gasolines according to ANP 309 regulation. Following the optimisation of the SIMCA algorithm, it was possible to classify correctly 96% of the commercial gasoline samples present in the training set of 100. In order to check the quality of the model, an external group of 50 gasoline samples (the prediction set) were analysed and the developed SIMCA model classified 94% of these correctly. The developed chemometric method is recommended for screening commercial gasoline quality and detection of potential adulteration. 相似文献

4.

Forward selection radial basis function networks applied to bacterial classification based on MALDI-TOF-MS

Zhang Z Wang D Harrington Pde B Voorhees KJ Rees J 《Talanta》2004,63(3):527-532

Forward selection improved radial basis function (RBF) network was applied to bacterial classification based on the data obtained by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). The classification of each bacterium cultured at different time was discussed and the effect of parameters of the RBF network was investigated. The new method involves forward selection to prevent overfitting and generalized cross-validation (GCV) was used as model selection criterion (MSC). The original data was compressed by using wavelet transformation to speed up the network training and reduce the number of variables of the original MS data. The data was normalized prior training and testing a network to define the area the neural network to be trained in, accelerate the training rate, and reduce the range the parameters to be selected in. The one-out-of-n method was used to split the data set of p samples into a training set of size p−1 and a test set of size 1. With the improved method, the classification correctness for the five bacteria discussed in the present paper are 87.5, 69.2, 80, 92.3, and 92.8%, respectively. 相似文献

5.

Automated evaluation of food colour by means of multivariate image analysis coupled to a wavelet-based classification algorithm

Andrea Antonelli Patrizia Fava Gian Carlo Franchini Alessandro Ulrici 《Analytica chimica acta》2004,515(1):3-13

This paper describes an approach for the colour-based classification of RGB images, taken with a common digital CCD camera on inhomogeneous food matrices. The aim was that of elaborating a feature selection/classification method independent of the specific food matrix that is analysed, in the sense that the variables that are the most relevant ones for the classification of the analysed samples are selected in a blind way, with no a priori assumptions on the basis of the nature of the considered food matrix. A one-dimensional signal describing the colour content of each acquired digital image, which we have called colourgram, is created as the contiguous sequence of the frequency distribution curves of the three red, green and blue colours values, of related parameters (also including hue, saturation and intensity) and of the scores values deriving from the PCA analysis of the unfolded 3D image array, together with the corresponding loadings values and eigenvalues. Once a sufficient number of digital images has been acquired, the corresponding colourgrams are then analysed by means of a feature selection/classification algorithm based on the wavelet transform, wavelet packet transform for efficient pattern recognition (WPTER). This approach was tested on a series of samples of “pesto”, a typical Italian vegetable pasta sauce, which presents high colour variability, mainly due to technological variables (raw materials, processes) and to the degradation of chlorophylls during storage. Good classification results (100% of correctly classified objects with very parsimonious models) have been obtained, also in comparison with the visual evaluation results of a panel test. 相似文献

6.

Swarm intelligence based wavelet coefficient feature selection for mass spectral classification: An application to proteomics data

Weixiang Zhao 《Analytica chimica acta》2009,651(1):15-3594

This paper introduces the ant colony algorithm, a novel swarm intelligence based optimization method, to select appropriate wavelet coefficients from mass spectral data as a new feature selection method for ovarian cancer diagnostics. By determining the proper parameters for the ant colony algorithm (ACA) based searching algorithm, we perform the feature searching process for 100 times with the number of selected features fixed at 5. The results of this study show: (1) the classification accuracy based on the five selected wavelet coefficients can reach up to 100% for all the training, validating and independent testing sets; (2) the eight most popular selected wavelet coefficients of the 100 runs can provide 100% accuracy for the training set, 100% accuracy for the validating set, and 98.8% accuracy for the independent testing set, which suggests the robustness and accuracy of the proposed feature selection method; and (3) the mass spectral data corresponding to the eight popular wavelet coefficients can be located by reverse wavelet transformation and these located mass spectral data still maintain high classification accuracies (100% for the training set, 97.6% for the validating set, and 98.8% for the testing set) and also provide sufficient physical and medical meaning for future ovarian cancer mechanism studies. Furthermore, the corresponding mass spectral data (potential biomarkers) are in good agreement with other studies which have used the same sample set. Together these results suggest this feature extraction strategy will benefit the development of intelligent and real-time spectroscopy instrumentation based diagnosis and monitoring systems. 相似文献

7.

Unsupervised parameter optimization for automated retention time alignment of severely shifted gas chromatographic data using the piecewise alignment algorithm

Pierce KM Wright BW Synovec RE 《Journal of chromatography. A》2007,1141(1):106-116

Simulated chromatographic separations were used to study the performance of piecewise retention time alignment and to demonstrate automated unsupervised (without a training set) parameter optimization. The average correlation coefficient between the target chromatogram and all remaining chromatograms in the data set was used to optimize the alignment parameters. This approach frees the user from providing class information and makes the alignment algorithm applicable to classifying completely unknown data sets. The average peak in the raw simulated data set was shifted up to two peak-widths-at-base (average relative shift=2.0) and after alignment the average relative shift was improved to 0.3. Piecewise alignment was applied to severely shifted GC separations of gasolines and reformate distillation fraction samples. The average relative shifts in the raw gasolines and reformates data were 4.7 and 1.5, respectively, but after alignment improved to 0.5 and 0.4, respectively. The effect of piecewise alignment on peak heights and peak areas is also reported. The average relative difference in peak height was -0.20%. The average absolute relative difference in area was 0.15%. 相似文献

8.

Determination of detergent and dispensant additives in gasoline by ring-oven and near infrared hypespectral imaging

Lívia Rodrigues e Brito Michelle P.F. da Silva Jarbas J.R. Rohwedder Celio Pasquini Fernanda A. Honorato Maria Fernanda Pimentel 《Analytica chimica acta》2015

A method using the ring-oven technique for pre-concentration in filter paper discs and near infrared hyperspectral imaging is proposed to identify four detergent and dispersant additives, and to determine their concentration in gasoline. Different approaches were used to select the best image data processing in order to gather the relevant spectral information. This was attained by selecting the pixels of the region of interest (ROI), using a pre-calculated threshold value of the PCA scores arranged as histograms, to select the spectra set; summing up the selected spectra to achieve representativeness; and compensating for the superimposed filter paper spectral information, also supported by scores histograms for each individual sample. The best classification model was achieved using linear discriminant analysis and genetic algorithm (LDA/GA), whose correct classification rate in the external validation set was 92%. Previous classification of the type of additive present in the gasoline is necessary to define the PLS model required for its quantitative determination. Considering that two of the additives studied present high spectral similarity, a PLS regression model was constructed to predict their content in gasoline, while two additional models were used for the remaining additives. The results for the external validation of these regression models showed a mean percentage error of prediction varying from 5 to 15%. 相似文献

9.

Diagnostic pattern recognition on gene-expression profile data by using one-class classification

Xu Y Brereton RG 《Journal of chemical information and modeling》2005,45(5):1392-1401

相似文献

10.

基于近红外光谱和OPLS-DA的不同牌号卷烟分类识别方法研究 总被引：1，自引：0，他引：1

潘曦刘辉王昊刘静何昀潞黄伟初邱昌桂《分析测试学报》2020,39(11):1385-1391

为了对卷烟牌号进行准确分类鉴别,提出了一种基于近红外光谱(NIRS)分析技术结合有监督的模式识别快速鉴别卷烟牌号的新方法。利用标准正态变量变换(SNV)、多元散射校正(MSC)、一阶导数(FD)、二阶导数(SD)和Savitzky-Golay平滑(SG)及其相结合的光谱预处理方法对烟丝光谱进行预处理,通过近红外光谱结合主成分分析(PCA)、偏最小二乘判别分析(PLS-DA)和正交偏最小二乘判别分析(OPLS-DA) 3种模式识别方法对不同牌号烟丝进行分类识别研究,并采用分类识别正确率作为评价指标。实验结果表明：(1)烟丝近红外光谱主成分得分图交叉重叠,区分不明显,PCA无法识别出5种牌号的成品烟丝;(2)烟丝光谱经MSC+FD预处理后的PLS-DA模型可得到较好的识别效果,校正集和测试集的分类识别正确率分别为100%和98.3%;(3)烟丝光谱经MSC+SD预处理后的OPLS-DA模型的模式识别效果最好,模型对自变量拟合指数(R2X）,因变量的拟合指数(R2Y)和模型预测指数(Q2)分别为0.485、0.907 和0.748,近红外光谱校正集和测试集的分类识别正确率均为100%。说明近红外光谱技术结合有监督模式识别方法OPLS-DA建立的烟丝牌号分类模型具有高效快速、准确无损的优点,为卷烟烟丝分类提供了一种新的快速鉴别方法。相似文献

11.

偏最小二乘法在红外光谱识别茶叶中的应用 总被引：1，自引：0，他引：1

顾小红冯宇汤坚《分析科学学报》2008,24(2):131-135

采用漫反射傅立叶变换红外光谱(FTIR)法结合主成分分析(PCA)、偏最小二乘法(PLS)、簇类的独立软模式(SIMCA)识别法对十三种茶叶进行了分类判别研究。研究结果表明,通过多元散射校正(MSC)对原始光谱进行预处理,可以提高模式识别技术的分类判别效果。在此基础上,选取1 900~900 cm-1波长范围内的茶叶红外光谱建立识别模型,三种方法都得到了满意的分类判别效果。在对检验集中全部130个样本的判别中,PCA仅有两类样本无法判别,SIMCA的识别率和拒绝率都在90%以上,而PLS的识别效果最佳,全部样本都得到了正确的归类。这一研究结果表明傅立叶变换红外光谱法与化学计量学方法相结合可以实现茶叶品种的快速鉴别,这为茶叶的客观评审提供了一种新思路。相似文献

12.

基于支持向量机的药物诱导磷脂质病预测模型

解扬张会杨胜勇《化学研究与应用》2011,23(6):696-701

本文应用一种组合遗传算法和共轭梯度法的支持向量机(GA-CG-SVM)方法建立了药物诱导磷脂质病分类预测模型.首先对描述符进行了优化,选出了19个描述符用于模型的构建,所建模型对训练集的预测准确率为81.6%,对测试集的预测精度为87.5%,说明所建SVM分类模型不仅能正确预测训练集药物诱导的磷脂质病,也对其他化合物具... 相似文献

13.

Anionic forensic signatures for sample matching of potassium cyanide using high performance ion chromatography and chemometrics

Fraga CG Farmer OT Carman AJ 《Talanta》2011,83(4):1166-1172

Potassium cyanide was used as a model toxicant to determine the feasibility of using anionic impurities as a forensic signature for matching cyanide salts back to their source. In this study, portions of eight KCN stocks originating from four countries were separately dissolved in water and analyzed by high performance ion chromatography (HPIC) using an anion exchange column and conductivity detection. Sixty KCN aqueous samples were produced from the eight stocks and analyzed for 11 anionic impurities. Hierarchal cluster analysis and principal component analysis were used to demonstrate that KCN samples cluster according to source based on the concentrations of their anionic impurities. The Fisher-ratio method and degree-of-class separation (DCS) were used for feature selection on a training set of KCN samples in order to optimize sample clustering. The optimal subset of anions needed for sample classification was determined to be sulfate, oxalate, phosphate, and an unknown anion named unk5. Using K-nearest neighbors (KNN) and the optimal subset of anions, KCN test samples from different KCN stocks were correctly determined to be manufactured in the United States. In addition, KCN samples from stocks manufactured in Belgium, Germany, and the Czech Republic were all correctly matched back to their original stocks because each stock had a unique anionic impurity profile. The application of the Fisher-ratio method and DCS for feature selection improved the accuracy and confidence of sample classification by KNN. 相似文献

14.

基于多阶导数拉曼光谱组合技术的矿物油模式分类

卫辰洁王继芬张波董泽管建皓《分析测试学报》2021,40(5):747-753

为了实现对法庭科学领域重质矿物油物证的快速、准确、无损的鉴定,该文基于光谱分析技术提出了一种多阶导数光谱数据组合分析的方法。收集了80种不同型号、不同厂家的重质矿物油样本,利用傅里叶变换拉曼光谱分析法采集样本的原始光谱数据和导数光谱数据,并通过结合化学计量学构建分类模型。在构建的主成分分析(PCA)结合径向基函数神经网络(RBF)分类模型中,对单独的原始光谱、一阶导数谱和二阶导数谱数据的训练集准确率分别为80.0%、86.7%和86.2%,测试集准确率分别为73.3%、80.0%和72.7%;对组合后的原始光谱+一阶导数谱、原始光谱+二阶导数谱和一阶导数谱+二阶导数谱数据的分类中,训练集准确率分别为97.0%、96.7%和100%,测试集准确率分别为85.7%、90.0%和100%。结果表明,对组合后的导数光谱与原始光谱构建分类模型,准确率更高。其中,基于一阶导数谱+二阶导数谱数据构建的PCA结合RBF分类模型的结果最为理想,准确率达100%。而K最近邻算法模型由于受到样本不均匀的影响,整体分类准确率均较低。利用组合的导数光谱与原始光谱数据构建分类模型能够实现对重质矿物油样本的快速、准确、无损鉴别,可为光谱组合技术在法庭科学及其他分析测试领域的应用提供一定的借鉴和参考。相似文献

15.

Cluster resolution: a metric for automated, objective and optimized feature selection in chemometric modeling

Sinkov NA Harynuk JJ 《Talanta》2011,83(4):1079-1087

A novel metric termed cluster resolution is presented. This metric compares the separation of clusters of data points while simultaneously considering the shapes of the clusters and their relative orientations. Using cluster resolution in conjunction with an objective variable ranking metric allows for fully automated feature selection for the construction of chemometric models. The metric is based upon considering the maximum size of confidence ellipses around clusters of points representing different classes of objects that can be constructed without any overlap of the ellipses. For demonstration purposes we utilized PCA to classify samples of gasoline based upon their octane rating. The entire GC-MS chromatogram of each sample comprising over 2 × 10⁶ variables was considered. As an example, automated ranking by ANOVA was applied followed by a forward selection approach to choose variables for inclusion. This approach can be generally applied to feature selection for a variety of applications and represents a significant step towards the development of fully automated, objective construction of chemometric models. 相似文献

16.

基于非接触式拉曼光谱分析人血与犬血的PCA-LDA鉴别方法 总被引：2，自引：0，他引：2

郑祥权廖鑫徐溢洪明坚《高等学校化学学报》2017,38(4)

将拉曼光谱分析法与数理统计方法有机结合,构建人血与犬血种属判别模型,实现了不同种属血液样本的高效无损鉴别.采用拉曼光谱的无损测试模式对血液样本进行测试,考察了抗凝管管材、聚焦位置及曝光时间等对血液样本拉曼光谱的影响,在激发波长为632.8 nm,光谱扫描范围为200~1800 cm-1,功率衰减率50%,曝光时间5 s及累加次数为2次的优化条件下,获得了无损检测条件下的血液样本拉曼光谱图.针对血液样本组分复杂、拉曼光谱信号基底背景高等问题,提出了基于小波变换去噪,进行分段多项式基线校正的预处理方法,有效解决了血液样本拉曼光谱谱图的高噪音和基线漂移问题.实验选择30例正常人血和33例比格犬血为样本训练集,5例正常人血和5例比格犬血为测试集,基于主成分分析法(PCA)联合线性判别法(LDA)模型,训练集分类正确率达到95.23%,盲测集分类正确率达90.00%.这种基于非接触式血液样本拉曼光谱和PCA-LDA判断模型的测试方法在进出口检验检疫等涉及血液无损鉴别的领域具有广泛的应用价值和前景. 相似文献

17.

Predictive activity profiling of drugs by topological-fragment-spectra-based support vector machines

Kawai K Fujishima S Takahashi Y 《Journal of chemical information and modeling》2008,48(6):1152-1160

Aiming at the prediction of pleiotropic effects of drugs, we have investigated the multilabel classification of drugs that have one or more of 100 different kinds of activity labels. Structural feature representation of each drug molecule was based on the topological fragment spectra method, which was proposed in our previous work. Support vector machine (SVM) was used for the classification and the prediction of their activity classes. Multilabel classification was carried out by a set of the SVM classifiers. The collective SVM classifiers were trained with a training set of 59,180 compounds and validated by another set (validation set) of 29,590 compounds. For a test set that consists of 9,864 compounds, the classifiers correctly classified 80.8% of the drugs into their own active classes. The SVM classifiers also successfully performed predictions of the activity spectra for multilabel compounds. 相似文献

18.

Pattern recognition of Inductively Coupled Plasma Atomic Emission Spectroscopy of human scalp hair for discriminating between healthy and Hepatitis C patients

Gavin R. Lloyd Mohammad Wasim 《Analytica chimica acta》2009,649(1):33-64

Inductively Coupled Plasma Atomic Emission Spectroscopy measurements of six trace elements were performed on the scalp hair of 155 donors, 73 of which have been diagnosed with Hepatitis C and 82 Controls. Principal Components Analysis (PCA) was employed to visualise the separation between groups and show the relationship between the elements and the diseased state. Pattern recognition methods for classification involving Quadratic Discriminant Analysis and Partial Least Squares Discriminant Analysis (PLS-DA) were applied to the data. The number of significant components for both PCA and PLS were determined using the bootstrap. The stability of training set models were determined by repeatedly splitting the data into training and test sets and employing visualisation for two components models: the percent classification ability (CC), predictive ability (PA) and model stability (MS) were computed for test and training sets. 相似文献

19.

Combining visible and near-infrared spectroscopy with chemometrics to trace muscles from an autochthonous breed of pig produced in Uruguay: a feasibility study

Cozzolino D Vadell A Ballesteros F Galietta G Barlocco N 《Analytical and bioanalytical chemistry》2006,385(5):931-936

Visible (Vis) and near-infrared reflectance (NIR) spectroscopy combined with chemometrics was explored as a tool to trace muscles from autochthonous and crossbreed pigs from Uruguay. Muscles were sourced from two breeds, namely, the Pampa-Rocha (PR) and the Pampa-Rocha x Duroc (PRxD) crossbreed. Minced muscles were scanned in the Vis and NIR regions (400–2,500 nm) in a monochromator instrument in reflectance. Principal component analysis (PCA), discriminant partial least square regression (DPLS), linear discriminant analysis (LDA) based on PCA scores and soft independent modelling of class analogy (SIMCA) were used to identify the origin of the muscles based on Vis and NIR data. Full cross validation was used as validation method when classification models were developed. DPLS correctly classified 87% of PR and 78% of PRxD muscle samples. LDA calibration models correctly classified 87 and 67% of muscles as PR and PRxD, respectively. SIMCA correctly classified 100% of PR muscles. The results demonstrated the usefulness of Vis and NIR spectra combined with chemometrics as rapid method for authentication and identification of muscles according to the breed of pig. 相似文献

20.

机器学习方法用于二氢叶酸还原酶抑制剂的活性预测

陈晓梅饶含兵黄文丽李泽荣《高等学校化学学报》2007,28(11):2171-2178

分别采用支持向量学习机、人工神经网络、调节性逻辑回归和K-最临近等机器学习方法对761个二氢叶酸还原酶抑制剂建立了其活性分类预测模型. 采用组成描述符和拓扑描述符表征抑制剂的分子结构及物理化学性质, 使用Kennard-Stone方法进行训练集的设计, 并用Metropolis Monte Carlo模拟退火方法作变量选择. 结果表明, 支持向量学习机优于其它机器学习方法, 所得到的最优模型具有较好的预测结果, 其预测正确率为91.62%. 说明通过合适的训练集设计及变量选择, 支持向量学习机方法可以很好地用于二氢叶酸还原酶抑制剂的活性分类预测. 相似文献