首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sinkov NA  Harynuk JJ 《Talanta》2011,83(4):1079-1087
A novel metric termed cluster resolution is presented. This metric compares the separation of clusters of data points while simultaneously considering the shapes of the clusters and their relative orientations. Using cluster resolution in conjunction with an objective variable ranking metric allows for fully automated feature selection for the construction of chemometric models. The metric is based upon considering the maximum size of confidence ellipses around clusters of points representing different classes of objects that can be constructed without any overlap of the ellipses. For demonstration purposes we utilized PCA to classify samples of gasoline based upon their octane rating. The entire GC-MS chromatogram of each sample comprising over 2 × 106 variables was considered. As an example, automated ranking by ANOVA was applied followed by a forward selection approach to choose variables for inclusion. This approach can be generally applied to feature selection for a variety of applications and represents a significant step towards the development of fully automated, objective construction of chemometric models.  相似文献   

2.
The alignment of instrumental signals, such as chromatograms, is regarded as an important step before applying multivariate chemometric techniques for data analysis. Nowadays, many alignment techniques are available and they differ in achieving their goal. They can correct peak shifts in a set of chromatograms with differing degrees of success. Almost all alignment techniques, with few exceptions [e.g., W. Yu, B. Wu, N. Lin, K. Stone, K. Williams, H. Zhao, Comput. Biol. Chem. 30 (2006) 27], require a careful choice of the target profile. The selection of a target signal is not an easy task and some difficulties related to this selection are discussed in this paper. An analysis of several simulated sets of chromatographic signals showed that the target selection can be a crucial step if the aligned signals are then used as input to unsupervised approaches, such as, e.g., principal component analysis and to supervised methods like discriminant partial least squares. Different proposals for target selection known-to-date are reviewed. As demonstrated in our study the target profile with the highest correlation coefficient among all the signals studied gave the most satisfactory results.  相似文献   

3.
A useful methodology is introduced for the analysis of data obtained via gas chromatography with mass spectrometry (GC-MS) utilizing a complete mass spectrum at each retention time interval in which a mass spectrum was collected. Principal component analysis (PCA) with preprocessing by both piecewise retention time alignment and analysis of variance (ANOVA) feature selection is applied to all mass channels collected. The methodology involves concatenating all concurrently measured individual m/z chromatograms from m/z 20 to 120 for each GC-MS separation into a row vector. All of the sample row vectors are incorporated into a matrix where each row is a sample vector. This matrix is piecewise aligned and reduced by ANOVA feature selection. Application of the preprocessing steps (retention time alignment and feature selection) to all mass channels collected during the chromatographic separation allows considerably more selective chemical information to be incorporated in the PCA classification, and is the primary novelty of the report. This methodology is objective and requires no knowledge of the specific analytes of interest, as in selective ion monitoring (SIM), and does not restrict the mass spectral data used, as in both SIM and total ion current (TIC) methods. Significantly, the methodology allows for the classification of data with low resolution in the chromatographic dimension because of the added selectivity from the complete mass spectral dimension. This allows for the successful classification of data over significantly decreased chromatographic separation times, since high-speed separations can be employed. The methodology is demonstrated through the analysis of a set of four differing gasoline samples that serve as model complex samples. For comparison, the gasoline samples are analyzed by GC-MS over both 10-min and 10-s separation times. The successfully classified 10-min GC-MS TIC data served as the benchmark analysis to compare to the 10-s data. When only alignment and feature selection was applied to the 10-s gasoline separations using GC-MS TIC data, PCA failed. PCA was successful for 10-s gasoline separations when the methodology was applied with all the m/z information. With ANOVA feature selection, chromatographic regions with Fisher ratios greater than 1500 were retained in a new matrix and subjected to PCA yielding successful classification for the 10-s separations.  相似文献   

4.
《Analytical letters》2012,45(14):2475-2492
Abstract

Recently, the fingerprinting approach using chromatography has become one of the most potent tools for quality assessment of herbal medicine. Due to the complexity of the chromatographic fingerprint and the irreproducibility of chromatographic instruments and experimental conditions, several chemometric approaches such as variance analysis, peak alignment, correlation analysis, and pattern recognition were employed to deal with the chromatographic fingerprint in this work. To facilitate the data preprocessing, a software named Computer Aided Similarity Evaluation (CASE) was also developed. All programs of chemometric algorithms for CASE were coded in MATLAB5.3 based on Windows. Data loading, removing, cutting, smoothing, compressing, background and retention time shift correction, normalization, peak identification and matching, variation determination of common peaks/regions, similarity comparison, sample classification, and other data processes associated with the chromatographic fingerprint were investigated in this software. The case study of high pressure liquid chromatographic HPLC fingerprints of 50 Rhizoma chuanxiong samples from different sources demonstrated that the chemometric approaches investigated in this work were reliable and user friendly for data preprocessing of chromatographic fingerprints of herbal medicines for quality assessment.  相似文献   

5.
Yao W  Yin X  Hu Y 《Journal of chromatography. A》2007,1160(1-2):254-262
The alignment of chromatographic signals is an important preprocessing step before further multivariate analysis. This paper presents a method, automated peak alignment by beam search (Auto-PABS), to solve the problem of peak shift in chemical chromatographic fingerprints by piecewise shifting and linearly interpolating. It is characterized by searching an adaptive range for the values of shifting and linearly interpolating of each segment. This search range is estimated by the calculation of fast Fourier transform cross correlation between the sample segment and its corresponding reference segment. Thus, arbitrary peak alignment is avoided when the real peak shifts are unknown in a large data set. Since the maximum of search range is close to the real shift, more accurate beam search is adopted to accomplish the optimization process. Simulated data and herbal medicine fingerprints of HPLC and GC are selected for evaluation. The output matrix of aligned chromatographic profiles is used directly for principal components analysis, yielding satisfactory results on real samples.  相似文献   

6.
A fast and objective chemometric classification method is developed and applied to the analysis of gas chromatography (GC) data from five commercial gasoline samples. The gasoline samples serve as model mixtures, whereas the focus is on the development and demonstration of the classification method. The method is based on objective retention time alignment (referred to as piecewise alignment) coupled with analysis of variance (ANOVA) feature selection prior to classification by principal component analysis (PCA) using optimal parameters. The degree-of-class-separation is used as a metric to objectively optimize the alignment and feature selection parameters using a suitable training set thereby reducing user subjectivity, as well as to indicate the success of the PCA clustering and classification. The degree-of-class-separation is calculated using Euclidean distances between the PCA scores of a subset of the replicate runs from two of the five fuel types, i.e., the training set. The unaligned training set that was directly submitted to PCA had a low degree-of-class-separation (0.4), and the PCA scores plot for the raw training set combined with the raw test set failed to correctly cluster the five sample types. After submitting the training set to piecewise alignment, the degree-of-class-separation increased (1.2), but when the same alignment parameters were applied to the training set combined with the test set, the scores plot clustering still did not yield five distinct groups. Applying feature selection to the unaligned training set increased the degree-of-class-separation (4.8), but chemical variations were still obscured by retention time variation and when the same feature selection conditions were used for the training set combined with the test set, only one of the five fuels was clustered correctly. However, piecewise alignment coupled with feature selection yielded a reasonably optimal degree-of-class-separation for the training set (9.2), and when the same alignment and ANOVA parameters were applied to the training set combined with the test set, the PCA scores plot correctly classified the gasoline fingerprints into five distinct clusters.  相似文献   

7.
复杂色谱信号自动解析中的化学计量学方法   总被引:1,自引:0,他引:1  
色谱及其联用技术日趋完善,并向自动化、高通量和快速的方向发展。化学计量学利用"数学分离"手段,可以实现色谱信号的自动化解析,已成为现代色谱分析中非常活跃的研究领域。但以往的化学计量学方法并不能完全有效地实现复杂色谱信号自动化解析。为此,自动化色谱解析算法成为科研工作者关心的重点,众多新型的自动化解析算法被提出。针对复杂一维色谱数据以及联用仪器得到的二维和更高维数据的自动化分析,化学计量学研究主要集中在自动色谱峰识别、背景以及基线漂移校正、色谱谱峰漂移校正以及重叠色谱峰的解析。该文对近十年来发展的复杂体系色谱信号自动化解析中化学计量学方法的原理与应用进行了总结与评述,比较了各类方法的优势与不足。在此基础上,针对当前色谱自动化分析过程中的难题对未来该领域的研究方向进行了展望。  相似文献   

8.
An in-depth study is presented to better understand how data reduction via averaging impacts retention alignment and the subsequent chemometric analysis of data obtained using gas chromatography (GC). We specifically study the use of signal averaging to reduce GC data, retention time alignment to correct run-to-run retention shifting, and principal component analysis (PCA) to classify chromatographic separations of diesel samples by sample class. Diesel samples were selected because they provide sufficient complexity to study the impact of data reduction on the data analysis strategies. The data reduction process reduces the data sampling ratio, S(R), which is defined as the number of data points across a given chromatographic peak width (i.e., the four standard deviation peak width). Ultimately, sufficient data reduction causes the chromatographic resolution to decrease, however with minimal loss of chemical information via the PCA. Using PCA, the degree of class separation (DCS) is used as a quantitative metric. Three "Paths" of analysis (denoted A-C) are compared to each other in the context of a "benchmark" method to study the impact of the data sampling ratio on preserving chemical information, which is defined by the DCS quantitative metric. The benchmark method is simply aligning data and applying PCA, without data reduction. Path A applies data alignment to collected data, then data reduction, and finally PCA. Path B applies data reduction to collected data, and then data alignment, and finally PCA. The optimized path, namely Path C, is created from Paths A and B, whereby collected data are initially reduced to fewer data points (smaller S(R)), then aligned, and then further reduced to even fewer points and finally analyzed with PCA to provide the DCS metric. Overall, following Path C, one can successfully and efficiently classify chromatographic data by reducing to a S(R) of ~15 before alignment, and then reducing down to S(R) of ~2 before performing PCA. Indeed, following Path C, results from an average of 15 different column length-with-temperature ramp rate combinations spanning a broad range of separation conditions resulted in only a ~15% loss in classification capability (via PCA) when the loss in chromatographic resolution was ~36%.  相似文献   

9.
Metabolic dataset can provide an overview of different herbal origin, which is conducted by some statistical procedures. Such results often deviate to a certain degree, due to peaks shifts in chromatographic signals. In order to solve this problem, an improved algorithm of combining sub‐window factor analysis with the mass spectrum information is proposed. The algorithm uses a peak detection approach derived either from multi‐scale Gaussian function or Haar wavelet to locate the peaks with different application scope; the candidate drift points at each peak are estimated by Fast Fourier transform cross correlation; Specifically, the best drift points at each candidate peaks are confirmed by sub‐window factor analysis and mass spectrum information in nontargeted metabolic profiling. Finally, the peak regions were aligned against a reference chromatogram, and the non‐peak regions were used linear interpolation. The chromatographic signals of 30 Bupleurum samples were aligned as an illustration of this algorithm, and they could be well distinguished using some statistical procedures. The result demonstrates that the presented method is stronger than other mass‐spectra based algorithms, when facing the alignment of some co‐eluted peaks.  相似文献   

10.
Coffee samples were analyzed by GC/MS in order to determine the most important peaks for the discrimination of the varieties Arabica and Robusta. The resulting peak tables from chromatographic analysis were aligned and pretreated before being submitted to multivariate analysis. A rapid and easy-to-perform peak alignment procedure, which does not require advanced programming skills to use, was compared with the tedious manual alignment procedure. The influence of three types of data pretreatment, normalization, logarithmic and square root transformations and their combinations, on the variables selected as most important by the regression coefficients of partial least squares-discriminant analysis (PLS-DA), are shown. Test samples different from those used in the calibration and comparison with the substances already known as being responsible for Arabica and Robusta coffees discrimination were used to determine the best pretreatments for both datasets. The data pretreatment consisting of square root transformation followed by normalization (RN) was chosen as being the most appropriate. The results obtained showed that the much quicker automated aligned method could be used as a substitute for the manually aligned method, allowing all the peaks in the chromatogram to be used for multivariate analysis.  相似文献   

11.
Liquid chromatography-mass spectrometry (LC/MS) has become the method of choice for characterizing complex mixtures. These analyses often involve quantitative comparison of components in multiple samples. To achieve automated sample comparison, the components of interest must be detected and identified, and their retention times aligned and peak areas calculated. This article describes a simple pairwise iterative retention time alignment algorithm, based on the divide-and-conquer approach, for alignment of ion features detected in LC/MS experiments. In this iterative algorithm, ion features in the sample run are first aligned with features in the reference run by applying a single constant shift of retention time. The sample chromatogram is then divided into two shorter chromatograms, which are aligned to the reference chromatogram the same way. Each shorter chromatogram is further divided into even shorter chromatograms. This process continues until each chromatogram is sufficiently narrow so that ion features within it have a similar retention time shift. In six pairwise LC/MS alignment examples containing a total of 6507 confirmed true corresponding feature pairs with retention time shifts up to five peak widths, the algorithm successfully aligned these features with an error rate of 0.2%. The alignment algorithm is demonstrated to be fast, robust, fully automatic, and superior to other algorithms. After alignment and gap-filling of detected ion features, their abundances can be tabulated for direct comparison between samples.  相似文献   

12.
An improved method for real-time selection of the target for the alignment of gas chromatographic data is described. Further outlined is a simple method to determine the accuracy of the alignment procedure. The target selection method proposed uses a moving window of aligned chromatograms to generate a target, herein referred to as the window target method (WTM). The WTM was initially tested using a series of 100 simulated chromatograms, and additionally evaluated using a series of 55 diesel fuel gas chromatograms obtained with four fuel samples. The WTM was evaluated via a comparison to a related method (the nearest neighbor method (NNM)). The results using the WTM with simulated chromatograms showed a significant improvement in the correlation coefficient and the accuracy of alignment when compared to the alignments performed using the NNM. A significant improvement in real-time alignment accuracy, as assessed by a correlation coefficient metric, was achieved with the WTM (starting at ∼1.0 and declining to only ∼0.985 for the 100th sample), relative to the NNM (starting at ∼1.0 and declining to ∼0.4 for the 100th sample) for the simulated chromatogram study. The results determined when using the WTM with the diesel fuels also showed an improvement in correlation coefficient and accuracy of the within-class alignments as compared to the results obtained from the NNM. In practice, the WTM could be applied to the real-time analysis of process and feedstock industrial streams to enable real-time decision making from the more precisely aligned chromatographic data.  相似文献   

13.
Simulated chromatographic separations were used to study the performance of piecewise retention time alignment and to demonstrate automated unsupervised (without a training set) parameter optimization. The average correlation coefficient between the target chromatogram and all remaining chromatograms in the data set was used to optimize the alignment parameters. This approach frees the user from providing class information and makes the alignment algorithm applicable to classifying completely unknown data sets. The average peak in the raw simulated data set was shifted up to two peak-widths-at-base (average relative shift=2.0) and after alignment the average relative shift was improved to 0.3. Piecewise alignment was applied to severely shifted GC separations of gasolines and reformate distillation fraction samples. The average relative shifts in the raw gasolines and reformates data were 4.7 and 1.5, respectively, but after alignment improved to 0.5 and 0.4, respectively. The effect of piecewise alignment on peak heights and peak areas is also reported. The average relative difference in peak height was -0.20%. The average absolute relative difference in area was 0.15%.  相似文献   

14.
Peak alignment using wavelet pattern matching and differential evolution   总被引:1,自引:0,他引:1  
Zhang ZM  Chen S  Liang YZ 《Talanta》2011,83(4):1108-1117
Retention time shifts badly impair qualitative or quantitative results of chemometric analyses when entire chromatographic data are used. Hence, chromatograms should be aligned to perform further analysis. Being inspired and motivated by this purpose, a practical and handy peak alignment method (alignDE) is proposed, implemented in this research for one-way chromatograms, which basically consists of five steps: (1) chromatogram lengths equalization using linear interpolation; (2) accurate peak pattern matching by continuous wavelet transform (CWT) with the Mexican Hat and Haar wavelets as its mother wavelets; (3) flexible baseline fitting utilizing penalized least squares; (4) peak clustering when gap of two peaks is smaller than a certain threshold; (5) peak alignment using differential evolution (DE) to maximize linear correlation coefficient between reference signal and signal to be aligned. This method is demonstrated with both simulated chromatograms and real chromatograms, for example, chromatograms of fungal extracts and Red Peony Root obtained by HPLC-DAD. It is implemented in R language and available as open source software to a broad range of chromatograph users (http://code.google.com/p/alignde).  相似文献   

15.
When discriminating herbal medicines with pattern recognition based on chromatographic fingerprints, typically, the majority of variables/data points contain no discrimination information. In this paper, chemometric approaches concerning forward selection and key set factor analysis using principal component analysis (PCA), unweighted and weighted methods based on the inner- and outer-variances, Fisher coefficient from the between- and within-class variations were investigated to extract representative variables. The number of variables retained was determined based on the cumulative variance percent of principal components, the ratio of observations to variables and the factor indicative function (IND). In order to assess the methods for variable selection and criteria levels to determine the number of variables retained, the original and reduced datasets were compared with Procrustes analysis and a weighted measure of similarity. Moreover, the tri-variate plots of the first three PCA scores were used to visually examine the reduced datasets in low dimensional space. Herbal samples were finally discriminated by use of Bayes discrimination analysis with the reduced subsets. The case study for 79 herbal samples showed that, the methods of forward selection associating the variables with the loadings closest to 0 and key set factor analysis were preferable to determine the representative variables. Procrustes analysis and the weighted measure were not indicative to extract representative variables. High matching between the original and reduced datasets did not suggest high prediction accuracy. Visually examining the PC1-PC2-PC3 scores projection plots with the reduced subsets, not all the herb samples could be separated due to the complexity of chromatographic fingerprints.  相似文献   

16.
《Electrophoresis》2018,39(11):1399-1409
The precursor compounds related to the bitterness of beer are called α‐acids. These compounds are extracted from the hop, which is an important ingredient in the brewing process. These compounds were analyzed by capillary electrophoresis. The electrophoretic method used 160 mmol/L of ammonium carbonate (pH 9) as BGE (background electrolyte), a voltage of +20 kV in a capillary with 50 μm of internal diameter and with a 62.5 cm of total length (54 cm effective). The samples were injected in hydrodynamic mode applying a pressure of 25 mbar for 5 s and the analytes were detected at 230 nm. A hydromethanolic extraction during 3 h was considered as the optimum condition for the sample preparation using MeOH/H2O 80:20 v/v as the extract solution. From the optimized conditions the electropherograms were evaluated for their use as input for chemometric modeling. Preprocessing investigation for electrophoretic data taking into account the alignment, denoising and baseline correction, and variable selection were considered before the chemometric modeling using principal component analysis (PCA). The electrophoretic data were systematically evaluated to find the optimum conditions to modeling. A PCA analysis for all tests was carried out using different preprocessing methods and, an explained variance higher than 90% was achieved in all of them. The optimized chemometric method worked with aligned and meancentered data. From this approach, a simple and efficient method to classify hop samples with high and low α‐acids content without the use of analytical standards was established from a simple electrophoretic analysis.  相似文献   

17.
Ginseng is a well‐known traditional Chinese medicinal herb, and ginsenosides are its major active components. A method for the fast determination of ginsenosides in ginseng samples by high‐performance liquid chromatography was developed and used for the quantitative analysis of four ginsenosides in three different ginseng samples. In this method, instead of time‐consuming gradient elution, isocratic elution was used to speed up the analysis. Under strong isocratic elution, all the ginsenosides are eluted in 2.3 min. Although the measured signal is composed of overlapped peaks with the interferences and background, the signal of ginsenosides can be extracted by chemometric resolution. A non‐negative immune algorithm was employed to obtain the chromatographic information of the target components from the data. Compared with conventional chemometric approaches, the method can perform the extraction for one‐dimensional overlapping signals. The method was validated by the determination of four ginsenosides in three different ginseng samples. The recoveries of the spiked samples were in the range of 94.08–107.3%.  相似文献   

18.
LC/MS is an analytical technique that, due to its high sensitivity, has become increasingly popular for the generation of metabolic signatures in biological samples and for the building of metabolic data bases. However, to be able to create robust and interpretable (transparent) multivariate models for the comparison of many samples, the data must fulfil certain specific criteria: (i) that each sample is characterized by the same number of variables, (ii) that each of these variables is represented across all observations, and (iii) that a variable in one sample has the same biological meaning or represents the same metabolite in all other samples. In addition, the obtained models must have the ability to make predictions of, e.g. related and independent samples characterized accordingly to the model samples. This method involves the construction of a representative data set, including automatic peak detection, alignment, setting of retention time windows, summing in the chromatographic dimension and data compression by means of alternating regression, where the relevant metabolic variation is retained for further modelling using multivariate analysis. This approach has the advantage of allowing the comparison of large numbers of samples based on their LC/MS metabolic profiles, but also of creating a means for the interpretation of the investigated biological system. This includes finding relevant systematic patterns among samples, identifying influential variables, verifying the findings in the raw data, and finally using the models for predictions. The presented strategy was here applied to a population study using urine samples from two cohorts, Shanxi (People's Republic of China) and Honolulu (USA). The results showed that the evaluation of the extracted information data using partial least square discriminant analysis (PLS-DA) provided a robust, predictive and transparent model for the metabolic differences between the two populations. The presented findings suggest that this is a general approach for data handling, analysis, and evaluation of large metabolic LC/MS data sets.  相似文献   

19.
In this study, the combination of chemometric resolution and cubic spline data interpolation was investigated as a method to correct the retention time shifts for chromatographic fingerprints of herbal medicines obtained by high-performance liquid chromatography-diode array detection (HPLC-DAD). With the help of the resolution approaches in chemometrics, it was easy to identify the purity of chromatographic peak clusters and then resolve the two-dimensional response matrix into chromatograms and spectra of pure chemical components so as to select multiple mark compounds involved in chromatographic fingerprints. With these mark components determined, the retention time shifts of chromatographic fingerprints might be then corrected effectively. After this correction, the cubic spline interpolation technique was then used to reconstruct new chromatographic fingerprints. The results in this work showed that, the purity identification of the chromatographic peak clusters together with the resolution of overlapping peaks into pure chromatograms and spectra by means of chemometric approaches could provide the sufficient chromatographic and spectral information for selecting multiple mark compounds to correct the retention time shifts. The cubic spline data interpolation technique was user-friendly to the reconstruction of new chromatographic fingerprints with correction. The successful application to the simulated and real chromatographic fingerprints of two Cortex cinnamomi, fifty Rhizoma chuanxiong, ten Radix angelicae and seventeen Herba menthae samples from different sources demonstrated the reliability and applicability of the approach investigated in this work. Pattern recognition based on principal component analysis for identifying inhomogenity in chromatographic fingerprints from real herbal medicines could further interpret it.  相似文献   

20.
Miao L  Cai W  Shao X 《Talanta》2011,83(4):1247-1253
Applications of hyphenated chromatographic techniques, especially GC-MS technique, have been reported in chemical, biological, environmental, agricultural and medical analysis. The complexity of the samples in these fields is still an obstacle for the technique to be practical and the overlapping of the multicomponent signals induces chemometric methods widely employed. In this work, taking the rapid analysis of pesticide mixture as an example, a chemometric approach was proposed for resolution of multicomponent overlapping GC-MS signal. In the method, a mass spectral library of pesticides was organized at first, then target factor analysis (TFA) was employed for testing the existence of a specific pesticide in the multicomponent overlapping GC-MS signal, and finally the chromatographic information of the pesticide was extracted by a non-negative immune algorithm (IA). A GC-MS signal of a 40-component pesticide mixture eluted within 9 min was analyzed by the method. It was found that the mass spectra and chromatographic profiles of almost all the pesticides can be obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号