首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
In multivariate spectral calibration by principal component regression (PCR), the principal components (PCs) are calculated from the response data measured at all employed instrument channels; however some channels are redundant and their responses do not possess useful information. Thus, the extracted PCs possess mixed information from both useful and redundant channels. In this work, we propose a segmentation approach based on unsupervised pattern recognition to identify the most informative spectral region and then to construct a stable multivariate calibration model by PCR. In this method, the instrument channels are clustered into different segments via Kohonen self‐organization map. The spectral data of each segment are then subjected to PCA and the derived PCs are used as input variables for an inverse least square (ILS) regression model employing stepwise selection of the informative PCs. The proposed method was evaluated by the analysis of four simulated and six experimental data sets. It was found that our proposed method can model the above data sets with prediction errors lower than conventional partial least squares (PLS) and PCR methods. In addition, the prediction ability of our method was better than the previously reported models for these data sets. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

2.
According to the intensive physical and mental risk of methamphetamine (crystal) on human, it is important to focus on the prevention of distribution and decrease of the usage of methamphetamine. In the current study, attempts was on the application of GC–MS analysis combined with chemometrics to present a classification model for methamphetamine samples seized in different regions of Iran. In this work, principal component analysis was not able to discriminate samples from different geographic regions. For the discrimination goal, partial least squares discriminant analysis (PLS-DA) and extended canonical variate analysis (ECVA) were utilized and a classification model was constructed to differentiate methamphetamine samples seized in three regions of Iran, i.e., south, west and central. PLS-DA showed good performance in calibration step; however, ECVA indicated better prediction ability. The difference of the classified samples can be because of difference in the synthetic root used in each of three investigated regions. Class sensitivity and selectivity for all three regions were excellent in ECVA model with nonsignificant misclassifications. Cross-validation and external validation using a test set confirmed the obtained classification model. Statistical results indicated a regional production/distribution pattern in the country.  相似文献   

3.
Paris Polyphylla Smith var. yunnanensis (Franch.) Hand.-Mazz has multiple therapeutic properties and the origins may affect clinical efficacy. Tracing the geographical origin is important to the authentication and quality assessment of this species. 177 wild samples collected from central, southeast and northwest Yunnan Province, China, were analyzed by single analytical method and data fusion strategies (low- and mid-levels) using Fourier transform mid-infrared (FT-MIR) and ultraviolet-visible (UV–vis) spectroscopies combined with chemometrics (partial least squares discrimination analysis (PLS-DA) and support vector machines grid search (SVM-GS)), for categorizing samples from different geographic origins. According to the results, mid-level data fusion strategy presented a better generalization performance and accuracy rates based on latent variables selected by PLS-DA than single analytical method and low-level data fusion strategy. Accuracy rates were almost 100% when both of the PLS-DA and SVM-GS were employed for classifying samples picked from southeast and northwest districts based on mid-level dataset. For samples collected from central of Yunnan where was divided into seven categories in this paper, the accuracy rates of training set and test set of PLS-DA and SVM-GS were preferable (>87%). Based on the mid-level data set, both of the classification results of PLS-DA and SVM-GS presented satisfying accuracy for 177 samples. Additionally, as small as possible parameters showed in mid-level data set, it suggested that this method was robust and generalized. Therefore, the comprehensive method was established for the origin traceability of wild P. Polyphylla Smith var. yunnanensis, which is meaningful for the quality control of herbal medicines.  相似文献   

4.
Du W  Gu T  Tang LJ  Jiang JH  Wu HL  Shen GL  Yu RQ 《Talanta》2011,85(3):1689-1694
As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.  相似文献   

5.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

6.
The predominance of partial least squares-discriminant analysis (PLS-DA) used to analyze metabolomics datasets (indeed, it is the most well-known tool to perform classification and regression in metabolomics), can be said to have led to the point that not all researchers are fully aware of alternative multivariate classification algorithms. This may in part be due to the widespread availability of PLS-DA in most of the well-known statistical software packages, where its implementation is very easy if the default settings are used. In addition, one of the perceived advantages of PLS-DA is that it has the ability to analyze highly collinear and noisy data. Furthermore, the calibration model is known to provide a variety of useful statistics, such as prediction accuracy as well as scores and loadings plots. However, this method may provide misleading results, largely due to a lack of suitable statistical validation, when used by non-experts who are not aware of its potential limitations when used in conjunction with metabolomics. This tutorial review aims to provide an introductory overview to several straightforward statistical methods such as principal component-discriminant function analysis (PC-DFA), support vector machines (SVM) and random forests (RF), which could very easily be used either to augment PLS or as alternative supervised learning methods to PLS-DA. These methods can be said to be particularly appropriate for the analysis of large, highly-complex data sets which are common output(s) in metabolomics studies where the numbers of variables often far exceed the number of samples. In addition, these alternative techniques may be useful tools for generating parsimonious models through feature selection and data reduction, as well as providing more propitious results. We sincerely hope that the general reader is left with little doubt that there are several promising and readily available alternatives to PLS-DA, to analyze large and highly complex data sets.  相似文献   

7.
Traditional Chinese herbal medicine (TCHM) plays an essential role in the international pharmaceutical industry due to its rich resources and unique curative properties. The flowers, stems, and leaves of Fritillaria contain a wide range of phytochemical compounds, including flavonoids, essential oils, saponins, and alkaloids, which may be useful for medicinal purposes. Fritillaria thunbergii Miq. Bulbs are commonly used in traditional Chinese medicine as expectorants and antitussives. In this paper, a feasibility study is presented that examines the use of hyperspectral imaging integrated with convolutional neural networks (CNN) to distinguish twelve (12) Fritillaria varieties (n = 360). The performance of support vector machines (SVM) and partial least squares-discriminant analysis (PLS-DA) was compared with that of convolutional neural network (CNN). Principal component analysis (PCA) was used to assess the presence of cluster trends in the spectral data. To optimize the performance of the models, cross-validation was used. Among all the discriminant models, CNN was the most accurate with 98.88%, 88.89% in training and test sets, followed by PLS-DA and SVM with 92.59%, 81.94% and 99.65%, 79.17%, respectively. The results obtained in the present study revealed that application of HSI in conjunction with the deep learning technique can be used for classification of Fritillaria thunbergii varieties rapidly and non-destructively.  相似文献   

8.
Near-infrared spectroscopy (NIRS) was applied for direct and rapid collection of characteristic spectra from Rhizoma Corydalis, a common traditional Chinese medicine (TCM), with the aim of developing a method for the classification of such substances according to their geographical origin. The powdered form of the TCM was collected from two such different sources, and their NIR spectra were pretreated by the wavelet transform (WT) method. A training set of such Rhizoma Corydalis spectral objects was modeled with the use of the least-squares support vector machines (LS-SVM), radial basis function artificial neural networks (RBF-ANN), partial least-squares discriminant analysis (PLS-DA) and K-nearest neighbors (KNN) methods. All the four chemometrics models performed reasonably on the basis of spectral recognition and prediction criteria, and the LS-SVM method performed best with over 95% success on both criteria. Generally, there are no statistically significant differences in all these four methods. Thus, the NIR spectroscopic method supported by all the four chemometrics models, especially the LS-SVM, are recommended for application to classify TCM, Rhizoma Corydalis, samples according to their geographical origin.  相似文献   

9.
The maturity of Camellia oleifera fruit is one of the most important indicators to optimize the harvest day, which, in turn, results in a high yield and good quality of the produced Camellia oil. A hyperspectral imaging (HSI) system in the range of visible and near-infrared (400–1000 nm) was employed to assess the maturity stages of Camellia oleifera fruit. Hyperspectral images of 1000 samples, which were collected at five different maturity stages, were acquired. The spectrum of each sample was extracted from the identified region of interest (ROI) in each hyperspectral image. Spectral principal component analysis (PCA) revealed that the first three PCs showed potential for discriminating samples at different maturity stages. Two classification models, including partial least-squares discriminant analysis (PLS-DA) and principal component analysis discriminant analysis (PCA-DA), based on the raw or pre-processed full spectra, were developed, and performances were compared. Using a PLS-DA model, based on second-order (2nd) derivative pre-processed spectra, achieved the highest results of correct classification rates (CCRs) of 99.2%, 98.4%, and 97.6% in the calibration, cross-validation, and prediction sets, respectively. Key wavelengths selected by PC loadings, two-dimensional correlation spectroscopy (2D-COS), and the uninformative variable elimination and successive projections algorithm (UVE+SPA) were applied as inputs of the PLS-DA model, while UVE-SPA-PLS-DA built the optimal model with the highest CCR of 81.2% in terms of the prediction set. In a confusion matrix of the optimal simplified model, satisfactory sensitivity, specificity, and precision were acquired. Misclassification was likely to occur between samples at maturity stages two, three, and four. Overall, an HSI with effective selected variables, coupled with PLS-DA, could provide an accurate method and a reference simple system by which to rapidly discriminate the maturity stages of Camellia oleifera fruit samples.  相似文献   

10.
It is known that 1H NMR spectroscopy represents a good tool for predicting the grape variety, the geographical origin, and the year of vintage of wine. In the present study we have shown that classification models can be improved when 1H NMR profiles are fused with stable isotope (SNIF-NMR, 18O, 13C) data. Variable selection based on clustering of latent variables was performed on 1H NMR data. Afterwards, the combined data of 718 wine samples from Germany were analyzed using linear discriminant analysis (LDA), partial least squares-discriminant analysis (PLS-DA), factorial discriminant analysis (FDA) and independent components analysis (ICA). Moreover, several specialized multiblock methods (common components and specific weights analysis (ComDim), consensus PCA and consensus PLS-DA) were applied to the data.  相似文献   

11.
Matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-ToF MS) has been exploited extensively in the field of microbiology for the characterisation of bacterial species, the detection of biomarkers for early disease diagnosis and bacterial identification. Here, the multivariate data analysis technique of partial least squares-discriminant analysis (PLS-DA) was applied to ‘intact cell’ MALDI-ToF MS data obtained from Escherichia coli cell samples to determine if such an approach could be used to distinguish between, and characterise, different growth phases. PLS-DA is a technique that has the potential to extract systematic variation from large and noisy data sets by identifying a lower-dimensional subspace that contains latent information. The application of PLS-DA to the MALDI-ToF data obtained from cells at different stages of growth resulted in the successful classification of the samples according to the growth phase of the bacteria cultures. A further outcome of the analysis was that it was possible to identify the mass-to-charge (m/z) ratio peaks or ion signals that contributed to the classification of the samples. The Swiss-Prot/TrEMBL database and primary literature were then used to provisionally assign a small number of these m/z ion signals to proteins, and these tentative assignments revealed that the major contributors from the exponential phase were ribosomal proteins. Additional assignments were possible for the stationary phase and the decline phase cultures where the proteins identified were consistent with previously observed biological interpretation. In summary, the results show that MALDI-ToF MS, PLS-DA and a protein database search can be used in combination to discriminate between ‘intact cell’ E. coli cell samples in different growth phases and thus could potentially be used as a tool in process development in the bioprocessing industry to enhance cell growth and cell engineering strategies.  相似文献   

12.
观察、比较正交信号校正(OSC)滤噪前后, 用不同的模式识别方法对正常成人血清代谢组1H NMR谱进行分析的效果, 以探讨NMR代谢组学技术应用于临床研究和疾病早期诊断的可行性. 78例正常成人在采血前按常规要求禁食8 h, 记录血清一维600 MHz氢谱后, 分别采用主成分分析(PCA)、偏最小二乘法-判别分析(PLS-DA)以及簇类的独立软模式法(SIMCA)对氢谱进行模式识别分析. 结果表明: 虽然采血前并无其它诸如饮食、生活方式、生理周期等方面的严格限制, 采用OSC 滤噪后, PLS-DA能够完全区分不同性别的血清氢谱, 其判别能力优于PCA和SIMCA. 而且采用OSC滤噪与文献报道的未经OSC处理的PLS-DA法获得的与性别分类有关的主要NMR积分区段基本相同. 从OSC去除不同数目的隐变量后所致的PLS-DA模型的性能改变可见: OSC去除两个隐变量时, 前两个隐变量的特征值明显比后面的大; 剩余残差为20.82%, 即去除了79.18%的X变量中与反应变量Y不相关的系统变异. 此时PLS-DA计算所得的隐变量个数为1; 而不使用OSC或用OSC去除一个隐变量时, PLS-DA所得的隐变量个数分别为3和2. 作为PLS-DA模型质量的评价指标, R2X表示PLS-DA模型计算所获得的隐变量反映自变量X的变异的百分比, R2Y则表示隐变量反映因变量Y的变异的百分比, Q2 (cum)为交叉验证后PLS-DA模型所获隐变量能够预测XY变异的累计百分比. R2X在OSC去除两个隐变量时达到最低值, 表明此时PLS-DA计算模型包含的系统变异最少; R2Y与Q2 (cum)都达到80%以上并趋于稳定, 说明OSC去除两个隐变量时PLS-DA模型的质量优良. 显然, OSC可去除饮食、环境等因素的影响, 降低临床样本的不均一性, 这对于NMR代谢组学技术应用于临床研究至关重要. OSC滤噪去除的隐变量个数应根据剩余残差、去除隐变量的特征值大小、PLS-DA模型计算所得的隐变量个数和反映模型质量的相关指标加以判断.  相似文献   

13.
Rhodiola, especially Rhodiola crenulate and Rhodiola rosea, is an increasingly widely used traditional medicine or dietary supplement in Asian and western countries. Because of the phytochemical diversity and difference of therapeutic efficacy among Rhodiola species, it is crucial to accurately identify them. In this study, a simple and efficient method of the classification of Rhodiola crenulate, Rhodiola rosea, and their confusable species (Rhodiola serrata, Rhodiola yunnanensis, Rhodiola kirilowii and Rhodiola fastigiate) was established by UHPLC fingerprints combined with chemical pattern recognition analysis. The results showed that similarity analysis and principal component analysis (PCA) could not achieve accurate classification among the six Rhodiola species. Linear discriminant analysis (LDA) combined with stepwise feature selection exhibited effective discrimination. Seven characteristic peaks that are responsible for accurate classification were selected, and their distinguishing ability was successfully verified by partial least-squares discriminant analysis (PLS-DA) and orthogonal partial least-squares discriminant analysis (OPLS-DA), respectively. Finally, the components of these seven characteristic peaks were identified as 1-(2-Hydroxy-2-methylbutanoate) β-D-glucopyranose, 4-O-glucosyl-p-coumaric acid, salidroside, epigallocatechin, 1,2,3,4,6-pentagalloyglucose, epigallocatechin gallate, and (+)-isolarisiresinol-4′-O-β-D-glucopyranoside or (+)-isolarisiresinol-4-O-β-D-glucopyranoside, respectively. The results obtained in our study provided useful information for authenticity identification and classification of Rhodiola species.  相似文献   

14.
Many analytical approaches such as mass spectrometry generate large amounts of data (input variables) per sample analysed, and not all of these variables are important or related to the target output of interest. The selection of a smaller number of variables prior to sample classification is a widespread task in many research studies, where attempts are made to seek the lowest possible set of variables that are still able to achieve a high level of prediction accuracy; in other words, there is a need to generate the most parsimonious solution when the number of input variables is huge but the number of samples/objects are smaller. Here, we compare several different variable selection approaches in order to ascertain which of these are ideally suited to achieve this goal. All variable selection approaches were applied to the analysis of a common set of metabolomics data generated by Curie-point pyrolysis mass spectrometry (Py-MS), where the goal of the study was to classify the Gram-positive bacteria Bacillus. These approaches include stepwise forward variable selection, used for linear discriminant analysis (LDA); variable importance for projection (VIP) coefficient, employed in partial least squares-discriminant analysis (PLS-DA); support vector machines-recursive feature elimination (SVM-RFE); as well as the mean decrease in accuracy and mean decrease in Gini, provided by random forests (RF). Finally, a double cross-validation procedure was applied to minimize the consequence of overfitting. The results revealed that RF with its variable selection techniques and SVM combined with SVM-RFE as a variable selection method, displayed the best results in comparison to other approaches.  相似文献   

15.
《Analytical letters》2012,45(13):1810-1823
Chromatographic profiles of Rhizoma et Radix Notoperygii (RRN, “Qianghuo” in Chinese), a complex traditional Chinese medicine (TCM), were collected by high-performance liquid chromatography with diode array detection (HPLC-DAD) at 330 nm. These data profiles were used as fingerprints to investigate quality control classification modeling of the RRN samples. In contrast to the classical methods for discrimination of TCMs, that is, just using common HPLC peaks, all chromatographic profile data were pre-processed by the correlation optimized warping method and polynomial functions; then, these data were submitted as fingerprints (variables) for classification on the basis of sample origin. Chemometrics methods used for calibration modeling and subsequent sample classification-least square support vector machine (LS-SVM), artificial neural network (ANN), and partial least square discriminant analysis (PLS-DA); all produced satisfactory calibrations as well as classification results.  相似文献   

16.
A new analytical strategy based on mass spectrometry fingerprinting combined with the NIST-MS search program for pattern recognition is evaluated and validated. A case study dealing with the tracing of the geographical origin of virgin olive oils (VOOs) proves the capabilities of mass spectrometry fingerprinting coupled with NIST-MS search program for classification. The volatile profiles of 220 VOOs from Liguria and other Mediterranean regions were analysed by secondary electrospray ionization-mass spectrometry (SESI-MS). MS spectra of VOOs were classified according to their origin by the freeware NIST-MS search v 2.0. The NIST classification results were compared to well-known pattern recognition techniques, such as linear discriminant analysis (LDA), partial least-squares discriminant analysis (PLS-DA), k-nearest neighbours (kNN), and counter-propagation artificial neural networks (CP-ANN). The NIST-MS search program predicted correctly 96% of the Ligurian VOOs and 92% of the non-Ligurian ones of an external independent data set; outperforming the traditional chemometric techniques (prediction abilities in the external validation achieved by kNN were 88% and 84% for the Ligurian and non-Ligurian categories respectively). This proves that the NIST-MS search software is a useful classification tool.  相似文献   

17.
Dual-domain classification analysis is proposed to identify pigments used in works of art studied by Raman spectroscopy and X-ray fluorescence spectrometry. By means of this methodology, Raman and X-ray fluorescence data are jointly processed by a high-level fusion approach. The system proposed aims to avoid the pre-processing stage and directly process raw data obtained from the instrument. The system is tested with spectra contaminated with background components of different shapes and intensities and with those with the background removed by line segment correction. The benefits of the approach were well demonstrated in a study of an ochre pigment classification.The approach is based on the main advantage of wavelet transform, which is multiresolution. Each spectrum is split into blocks, according to a specific frequency, to form a wavelet prism. Partial least squares-discriminant analysis (PLS-DA) is then applied to those blocks which contain the deterministic part of the signal and are not influenced by noise and background signal components. At the end, to obtain the final classification assignment, high-level data fusion of the classifications results (decision levels) obtained from PLS-DA analysis is done by means of fuzzy aggregation connective operators. Our study showed that fuzzy aggregation may be suitable for performing high-level data fusion on dual-domain data. This method can be automated so that classification can be rapid. It can handle classifications with different levels of difficulty and requires no prior knowledge of sample composition.  相似文献   

18.
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.  相似文献   

19.
Time-of-flight mass spectrometry along with statistical analysis was utilized to study metabolic profiles among rats fed resistant starch (RS) diets. Fischer 344 rats were fed four starch diets consisting of 55 % (w/w, dbs) starch. A control starch diet consisting of corn starch was compared against three RS diets. The RS diets were high-amylose corn starch (HA7), HA7 chemically modified with octenyl succinic anhydride, and stearic-acid-complexed HA7 starch. A subgroup received antibiotic treatment to determine if perturbations in the gut microbiome were long lasting. A second subgroup was treated with azoxymethane (AOM), a carcinogen. At the end of the 8-week study, cecal and distal colon content samples were collected from the sacrificed rats. Metabolites were extracted from cecal and distal colon samples into acetonitrile. The extracts were then analyzed on an accurate-mass time-of-flight mass spectrometer to obtain their metabolic profile. The data were analyzed using partial least-squares discriminant analysis (PLS-DA). The PLS-DA analysis utilized a training set and verification set to classify samples within diet and treatment groups. PLS-DA could reliably differentiate the diet treatments for both cecal and distal colon samples. The PLS-DA analyses of the antibiotic and no antibiotic-treated subgroups were well classified for cecal samples and modestly separated for distal colon samples. PLS-DA analysis had limited success separating distal colon samples for rats given AOM from those not treated; the cecal samples from AOM had very poor classification. Mass spectrometry profiling coupled with PLS-DA can readily classify metabolite differences among rats given RS diets.  相似文献   

20.
The partial least-squares (PLS) algorithm has become popular for explorative multivariate data analysis and for multivariate calibration. The same PLS algorithm can also be used for confirmatory data analysis. The discussion is limited to analysis of a single response variable. A close correspondence of PLS1 regression to classical analysis of variance (ANOVA) is demonstrated. The design of an experiment is described in terms of discrete design variables for main effects and simple interactions (dummy variables). These are used as regressors X = (x1, x2,…,) for modelling the response variable of the experiment, y. As in conventional use of PLS1 regression, the algorithm gives a concentrated model or diagram of the most important, y-relevant variability types in the X-data. In the present case, this gives the combination of design variables that models the variations in y. A simple plot of the resulting factor loadings immediately reveals the important design variables. Statistical tests and confidence regions in the PLS solution give additional safeguards against interpretation of spurious effects. The method is applied to two data sets. One concerns assessment of personal preference for blackcurrent juice, studied in a 25 factorial experiment; these data are also studied with missing values and as fractional factorials. The other ceoncers spectrophotometric absorbance-based colour assessments of pigment in strawberry jam in a 3-factor design with 2, 2 and 3 levels in the respective factors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号