首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a preliminary study in building discriminant models from solid-state NMR spectrometry data to detect the presence of acetaminophen in over-the-counter pharmaceutical formulations. The dataset, containing 11 spectra of pure substances and 21 spectra of various formulations, was processed by partial least squares discriminant analysis (PLS-DA). The model found coped with the discrimination, and its quality parameters were acceptable. It was found that standard normal variate preprocessing had almost no influence on unsupervised investigation of the dataset. The influence of variable selection with the uninformative variable elimination by PLS method was studied, reducing the dataset from 7601 variables to around 300 informative variables, but not improving the model performance. The results showed the possibility to construct well-working PLS-DA models from such small datasets without a full experimental design.  相似文献   

2.
Paper spray mass spectrometry (PS-MS) combined with partial least squares discriminant analysis (PLS-DA) was applied for the first time in a forensic context to a fast and effective differentiation of beers. Eight different brands of American standard lager beers produced by four different breweries (141 samples from 55 batches) were studied with the aim at performing a differentiation according to their market prices. The three leader brands in the Brazilian beer market, which have been subject to fraud, were modeled as the higher-price class, while the five brands most used for counterfeiting were modeled as the lower-price class. Parameters affecting the paper spray ionization were examined and optimized. The best MS signal stability and intensity was obtained while using the positive ion mode, with PS(+) mass spectra characterized by intense pairs of signals corresponding to sodium and potassium adducts of malto-oligosaccharides. Discrimination was not apparent neither by using visual inspection nor principal component analysis (PCA). However, supervised classification models provided high rates of sensitivity and specificity. A PLS-DA model using full scan mass spectra were improved by variable selection with ordered predictors selection (OPS), providing 100% of reliability rate and reducing the number of variables from 1701 to 60. This model was interpreted by detecting fifteen variables as the most significant VIP (variable importance in projection) scores, which were therefore considered diagnostic ions for this type of beer counterfeit.  相似文献   

3.
The predominance of partial least squares-discriminant analysis (PLS-DA) used to analyze metabolomics datasets (indeed, it is the most well-known tool to perform classification and regression in metabolomics), can be said to have led to the point that not all researchers are fully aware of alternative multivariate classification algorithms. This may in part be due to the widespread availability of PLS-DA in most of the well-known statistical software packages, where its implementation is very easy if the default settings are used. In addition, one of the perceived advantages of PLS-DA is that it has the ability to analyze highly collinear and noisy data. Furthermore, the calibration model is known to provide a variety of useful statistics, such as prediction accuracy as well as scores and loadings plots. However, this method may provide misleading results, largely due to a lack of suitable statistical validation, when used by non-experts who are not aware of its potential limitations when used in conjunction with metabolomics. This tutorial review aims to provide an introductory overview to several straightforward statistical methods such as principal component-discriminant function analysis (PC-DFA), support vector machines (SVM) and random forests (RF), which could very easily be used either to augment PLS or as alternative supervised learning methods to PLS-DA. These methods can be said to be particularly appropriate for the analysis of large, highly-complex data sets which are common output(s) in metabolomics studies where the numbers of variables often far exceed the number of samples. In addition, these alternative techniques may be useful tools for generating parsimonious models through feature selection and data reduction, as well as providing more propitious results. We sincerely hope that the general reader is left with little doubt that there are several promising and readily available alternatives to PLS-DA, to analyze large and highly complex data sets.  相似文献   

4.
Osteonecrosis of femoral head (ONFH) is a disease characterized by an impaired blood flow in the bone. The pathogenesis is still unknown, which makes an exact diagnosis troublesome and heavily dependent on experience. Exploring the information of molecular level by modern spectroscopy may help to discover the underlying pathogenesis and find its diagnostic application in clinical medicine. The study focuses on the combination of near-infrared (NIR) spectroscopy and classification models for discriminating ONFH and normal tissues. A total of 128 surgical specimens was prepared and NIR spectra were recorded by an integrating sphere. The experiment data set was divided into three subsets, i.e., the training set, validation set, and test set. Successive projection algorithm-linear discriminant analysis (SPA-LDA) was used to compress variables and build the diagnostic model. Partial least square-discriminant analysis (PLS-DA) was used as the reference. Principal component analysis (PCA) was used for exploratory analysis. The results showed that compared to PLS-DA, SPA-LDA provided a more parsimonious model using only seven variables and achieved better performance, i.e., sensitivity of 90.5 and 85%, and specificity of 100 and 95.5% for the validation and test sets, respectively. It indicated that NIR spectroscopy combined with SPA-LDA algorithm was a feasible aid tool for discriminating ONFH from normal tissue.  相似文献   

5.
A large suite of natural carbonate, fluorite and silicate geological materials was studied using laser-induced breakdown spectroscopy (LIBS). Both single- and double-pulse LIBS spectra were acquired using close-contact benchtop and standoff (25 m) LIBS systems. Principal components analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were used to identify the distinguishing characteristics of the geological samples and to classify the materials. Excellent discrimination was achieved with all sample types using PLS-DA and several techniques for improving sample classification were identified. The laboratory double-pulse LIBS system did not provide any advantage for sample classification over the single-pulse LIBS system, except in the case of the soil samples. The standoff LIBS system provided comparable results to the laboratory systems. This work also demonstrates how PCA can be used to identify spectral differences between similar sample types based on minor impurities.  相似文献   

6.
Variable scaling alters the covariance structure of data, affecting the outcome of multivariate analysis and calibration. Here we present a new method, variable stability (VAST) scaling, which weights each variable according to a metric of its stability. The beneficial effect of VAST scaling is demonstrated for a data set of 1H NMR spectra of urine acquired as part of a metabonomic study into the effects of unilateral nephrectomy in an animal model. The application of VAST scaling improved the class distinction and predictive power of partial least squares discriminant analysis (PLS-DA) models. The effects of other data scaling and pre-processing methods, such as orthogonal signal correction (OSC), were also tested. VAST scaling produced the most robust models in terms of class prediction, outperforming OSC in this aspect. As a result the subtle, but consistent, metabolic perturbation caused by unilateral nephrectomy could be accurately characterised despite the presence of much greater biological differences caused by normal physiological variation. VAST scaling presents itself as an interpretable, robust and easily implemented data treatment for the enhancement of multivariate data analysis.  相似文献   

7.
LC/MS is an analytical technique that, due to its high sensitivity, has become increasingly popular for the generation of metabolic signatures in biological samples and for the building of metabolic data bases. However, to be able to create robust and interpretable (transparent) multivariate models for the comparison of many samples, the data must fulfil certain specific criteria: (i) that each sample is characterized by the same number of variables, (ii) that each of these variables is represented across all observations, and (iii) that a variable in one sample has the same biological meaning or represents the same metabolite in all other samples. In addition, the obtained models must have the ability to make predictions of, e.g. related and independent samples characterized accordingly to the model samples. This method involves the construction of a representative data set, including automatic peak detection, alignment, setting of retention time windows, summing in the chromatographic dimension and data compression by means of alternating regression, where the relevant metabolic variation is retained for further modelling using multivariate analysis. This approach has the advantage of allowing the comparison of large numbers of samples based on their LC/MS metabolic profiles, but also of creating a means for the interpretation of the investigated biological system. This includes finding relevant systematic patterns among samples, identifying influential variables, verifying the findings in the raw data, and finally using the models for predictions. The presented strategy was here applied to a population study using urine samples from two cohorts, Shanxi (People's Republic of China) and Honolulu (USA). The results showed that the evaluation of the extracted information data using partial least square discriminant analysis (PLS-DA) provided a robust, predictive and transparent model for the metabolic differences between the two populations. The presented findings suggest that this is a general approach for data handling, analysis, and evaluation of large metabolic LC/MS data sets.  相似文献   

8.
Ramadan Z  Jacobs D  Grigorov M  Kochhar S 《Talanta》2006,68(5):1683-1691
The aim of this study was to evaluate evolutionary variable selection methods in improving the classification of 1H nuclear magnetic resonance (NMR) metabonomic profiles, and to identify the metabolites that are responsible for the classification. Human plasma, urine, and saliva from a group of 150 healthy male and female subjects were subjected to 1H NMR-based metabonomic analysis. The 1H NMR spectra were analyzed using two pattern recognition methods, principal component analysis (PCA) and partial least square discriminant analysis (PLS-DA), to identify metabolites responsible for gender differences. The use of genetic algorithms (GA) for variable selection methods was found to enhance the classification performance of the PLS-DA models. The loading plots obtained by PCA and PLS-DA were compared and various metabolites were identified that are responsible for the observed separations. These results demonstrated that our approach is capable of identifying the metabolites that are important for the discrimination of classes of individuals of similar physiological conditions.  相似文献   

9.
A metabonomic study based on the application of multivariate curve resolution and alternating least squares (MCR-ALS) to three-way data sets obtained by liquid chromatography coupled to mass spectrometry detection (LC-MS) was carried out for Rambo and Raf tomato cultivars treated with carbofuran pesticide. Samples were picked up during a 21 days period after treatment and analyzed by LC-MS in scan mode, along with the corresponding blank samples. Then, MCR-ALS was applied to the three-way data sets using column wise augmented matrices, and the evolutionary profiles as a function of the time after treatment were estimated for the metabolites present in both cultivars, as well as their corresponding pure spectra estimations. A comparative study using those estimations showed that some of these metabolites followed different behavior for the different cultivars after treatment. Since all treated and untreated Rambo and Raf samples were picked up according to the same sampling protocol and in a similar state of maturation, any difference in the behavior between profiles can be interpreted as an effect due to the presence of pesticide and to the kind of cultivar. Based on this hypothesis, several PLS-DA approaches were tested to check if it would be possible to classify samples by using the metabolites MCR estimations. Results showed that PLS-DA models for classification of treated or non-treated (blank) samples were the best ones obtained (98.44% of correct classifications for the validation set), which supports the stress effects related to carbofuran treatment. In addition, excellent discrimination among the four groups could be attained (89.06% of correct classifications for the validation set).  相似文献   

10.
Paris Polyphylla Smith var. yunnanensis (Franch.) Hand.-Mazz has multiple therapeutic properties and the origins may affect clinical efficacy. Tracing the geographical origin is important to the authentication and quality assessment of this species. 177 wild samples collected from central, southeast and northwest Yunnan Province, China, were analyzed by single analytical method and data fusion strategies (low- and mid-levels) using Fourier transform mid-infrared (FT-MIR) and ultraviolet-visible (UV–vis) spectroscopies combined with chemometrics (partial least squares discrimination analysis (PLS-DA) and support vector machines grid search (SVM-GS)), for categorizing samples from different geographic origins. According to the results, mid-level data fusion strategy presented a better generalization performance and accuracy rates based on latent variables selected by PLS-DA than single analytical method and low-level data fusion strategy. Accuracy rates were almost 100% when both of the PLS-DA and SVM-GS were employed for classifying samples picked from southeast and northwest districts based on mid-level dataset. For samples collected from central of Yunnan where was divided into seven categories in this paper, the accuracy rates of training set and test set of PLS-DA and SVM-GS were preferable (>87%). Based on the mid-level data set, both of the classification results of PLS-DA and SVM-GS presented satisfying accuracy for 177 samples. Additionally, as small as possible parameters showed in mid-level data set, it suggested that this method was robust and generalized. Therefore, the comprehensive method was established for the origin traceability of wild P. Polyphylla Smith var. yunnanensis, which is meaningful for the quality control of herbal medicines.  相似文献   

11.
Coffee samples were analyzed by GC/MS in order to determine the most important peaks for the discrimination of the varieties Arabica and Robusta. The resulting peak tables from chromatographic analysis were aligned and pretreated before being submitted to multivariate analysis. A rapid and easy-to-perform peak alignment procedure, which does not require advanced programming skills to use, was compared with the tedious manual alignment procedure. The influence of three types of data pretreatment, normalization, logarithmic and square root transformations and their combinations, on the variables selected as most important by the regression coefficients of partial least squares-discriminant analysis (PLS-DA), are shown. Test samples different from those used in the calibration and comparison with the substances already known as being responsible for Arabica and Robusta coffees discrimination were used to determine the best pretreatments for both datasets. The data pretreatment consisting of square root transformation followed by normalization (RN) was chosen as being the most appropriate. The results obtained showed that the much quicker automated aligned method could be used as a substitute for the manually aligned method, allowing all the peaks in the chromatogram to be used for multivariate analysis.  相似文献   

12.
This article presents a data analysis method for biomarker discovery in proteomics data analysis. In factor analysis-based discriminate models, the latent variables (LV's) are calculated from the response data measured at all employed instrument channels. Since some channels are irrelevant and their responses do not possess useful information, the extracted LV's possess mixed information from both useful and irrelevant channels. In this work, clustering of variables (CLoVA) based on unsupervised pattern recognition is suggested as an efficient method to identify the most informative spectral region and then it is used to construct a more predictive multivariate classification model. In the suggested method, the instrument channels (m/z value) are clustered into different clusters via self-organization map. Subsequently, the spectral data of each cluster are separately used as the input variables of classification methods such as partial least square-discriminate analysis (PLS-DA) and extended canonical variate analysis (ECVA). The proposed method is evaluated by the analysis of two experimental data sets (ovarian and prostate cancer data set). It is found that our proposed method is able to detect cancerous from healthy samples with much higher sensitivity and selectivity than conventional PLS-DA and ECVA methods.  相似文献   

13.
《Analytical letters》2012,45(12):1910-1921
Multiblock partial least squares (MB-PLS) are applied for determination of corn and tobacco samples by using near-infrared diffuse reflection spectroscopy. In the model, the spectra are separated into several sub-blocks along the wavenumber, and different latent variable number was used for each sub-block. Compared with ordinary PLS, the importance and the contribution of each sub-block can be balanced by super-weights and the usage of different latent variable numbers. Therefore, the prediction obtained by the MB-PLS model is superior to that of the ordinary PLS, especially for the large data sets of tobacco samples with a large number of variables.  相似文献   

14.
Detecting trace explosive residues at standoff distances in real-time is a difficult problem. One method ideally suited for real-time standoff detection is laser-induced breakdown spectroscopy (LIBS). However, atmospheric oxygen and nitrogen contributes to the LIBS signal from the oxygen- and nitrogen-containing explosive compounds, complicating the discrimination of explosives from other organic materials. While bathing the sample in an inert gas will remove atmospheric oxygen and nitrogen interference, it cannot practically be applied for standoff LIBS. Alternatively, we have investigated the potential of double pulse LIBS to improve the discrimination of explosives by diminishing the contribution of atmospheric oxygen and nitrogen to the LIBS signal. These initial studies compare the close-contact (< 1 m) LIBS spectra of explosives using single pulse LIBS in argon with double pulse LIBS in atmosphere. We have demonstrated improved discrimination of an explosive and an organic interferent using double pulse LIBS to reduce the air entrained in the analytical plasma.  相似文献   

15.
结合方差分析(ANOVA)和偏最小二乘法判别分析(PLS-DA)两种分析技术,对素食和普食人群的尿液1H NMR谱进行分析.利用ANOVA方法将数据矩阵分解为几个独立因素矩阵,滤除干扰因素后,再利用PLS-DA对单因素数据进行建模分析.实验结果表明,ANOVA/PLS-DA方法可以有效地减少饮食因素和性别因素之间的相互...  相似文献   

16.
This study compares results obtained with several chemometric methods: SIMCA, PLS2-DA, PLS2-DA with SIMCA, and PLS1-DA in two infrared spectroscopic applications. The results were optimized by selecting spectral ranges containing discriminant information. In the first application, mid-infrared spectra of crude petroleum oils were classified according to their geographical origins. In the second application, near-infrared spectra of French virgin olive oils were classified in five registered designations of origins (RDOs). The PLS-DA discrimination was better than SIMCA in classification performance for both applications. In both cases, the PLS1-DA classifications give 100% good results. The encountered difficulties with SIMCA analyses were explained by the criteria of spectral variance. As a matter of fact, when the ratio between inter-spectral variance and intra-spectral variance was close to the Fc (Fisher criterion) threshold, SIMCA analysis gave poor results. The discrimination power of the variable range selection procedure was estimated from the number of correctly classified samples.  相似文献   

17.
18.
Mid-infrared fiberoptics reflectance spectroscopy (mid-IR FORS) is a very interesting technique for artwork characterization purposes. However, the fact that the spectra obtained are a mixture of surface (specular) and volume (diffuse) reflection is a significant drawback. The physical and chemical features of the artwork surface may produce distortions in the spectra that hinder comparison with reference databases acquired in transmission mode. Several studies attempted to understand the influence of the different variables and propose procedures to improve the interpretation of the spectra. This article is focused on the application of mid-IR FORS and multivariate calibration to the analysis of easel paintings. The objectives are the evaluation of the influence of the surface roughness on the spectra, the influence of the matrix composition for the classification of unknown spectra, and the capability of obtaining pigment composition mappings. A first evaluation of a fast procedure for spectra management and pigment discrimination is discussed. The results demonstrate the capability of multivariate methods, principal component analysis (PCA), and partial least squares discrimination analysis (PLS-DA), to model the distortions of the reflectance spectra and to delimitate and discriminate areas of uniform composition. The roughness of the painting surface is found to be an important factor affecting the shape and relative intensity of the spectra. A mapping of the major pigments of a painting is possible using mid-IR FORS and PLS-DA when the calibration set is a palette that includes the potential pigments present in the artwork mixed with the appropriate binder and that shows the different paint textures. Graphical Abstract
?  相似文献   

19.
朱尔一  林燕  庄赞勇 《分析化学》2007,35(7):973-977
提出了一种新的偏最小二乘变量筛选方法,该方法利用PLS回归建模过程中的一些信息,删除一部分冗余的或对建模影响不大的变量来简化、优化预报模型。用此方法结合变量扩维方法处理云南昆明、思茅、西双版纳3个来源地缴获的244个海洛因样本的ICP-MS数据时,与传统的算法比较,模型的判别准确率得到大大提高,达到95%以上。且所得到的模型含变量少,很容易分析或解释各变量对模型的影响。因此该方法可用于对毒品来源有效的识别或鉴定。  相似文献   

20.
The potential of laser-induced breakdown spectroscopy (LIBS) to discriminate biological and chemical threat simulant residues prepared on multiple substrates and in the presence of interferents has been explored. The simulant samples tested include Bacillus atrophaeus spores, Escherichia coli, MS-2 bacteriophage, α-hemolysin from Staphylococcus aureus, 2-chloroethyl ethyl sulfide, and dimethyl methylphosphonate. The residue samples were prepared on polycarbonate, stainless steel and aluminum foil substrates by Battelle Eastern Science and Technology Center. LIBS spectra were collected by Battelle on a portable LIBS instrument developed by A3 Technologies. This paper presents the chemometric analysis of the LIBS spectra using partial least-squares discriminant analysis (PLS-DA). The performance of PLS-DA models developed based on the full LIBS spectra, and selected emission intensities and ratios have been compared. The full-spectra models generally provided better classification results based on the inclusion of substrate emission features; however, the intensity/ratio models were able to correctly identify more types of simulant residues in the presence of interferents. The fusion of the two types of PLS-DA models resulted in a significant improvement in classification performance for models built using multiple substrates. In addition to identifying the major components of residue mixtures, minor components such as growth media and solvents can be identified with an appropriately designed PLS-DA model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号