首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
G. Reich 《Chromatographia》1987,24(1):659-665
Summary The application of a newly developed peak recognition algorithm is shown. This algorith is based on the KNN method, one of the pattern recognition methods. It is shown that peaks with a S/N-ratio down to one can be safely recognized. This is also possible if the baseline has not only detector noise, but has other disturbances, e.g., noise signals which are generated by a reaction detector. The recognition ability of the algorithm is demonstrated by a standard chromatogram with three different concentrations and with two different sampling rates. The improvement against the classical algorithm is demonstrated. Some properties of the algorithm are discussed.  相似文献   

2.
Distance metrics facilitate a number of methods for statistical analysis. For statistical mechanical applications, it is useful to be able to compute the distance between two different orientations of a molecule. However, a number of distance metrics for rotation have been employed, and in this study, we consider different distance metrics and their utility in entropy estimation using the k‐nearest neighbors (KNN) algorithm. This approach shows a number of advantages over entropy estimation using a histogram method, and the different approaches are assessed using uniform randomly generated data, biased randomly generated data, and data from a molecular dynamics (MD) simulation of bulk water. The results identify quaternion metrics as superior to a metric based on the Euler angles. However, it is demonstrated that samples from MD simulation must be independent for effective use of the KNN algorithm and this finding impacts any application to time series data. © 2013 Wiley Periodicals, Inc.  相似文献   

3.
Problems in automated peak recognition in chromatography are discussed. An algorithm based on the k-nearest neighbour technique is proposed. Recognition of a peak is done by comparing it with a predefined profile function (normally a Gaussian peak profile). The profile and a part of the chromatogram are both interpreted as points in a multi-dimensional pattern space. The distance between the two points gives the value of the peak recognition function. The effects of different properties of chromatographic peaks (i.e., peak width, peak height and noise) and of the profile parameter (i.e., dimension of the pattern space, shape and width of the function, and characteristics of the distance measure) are evaluated. The method has excellent properties for recognizing peaks with low signal/noise (S/N) ratios; an example with S/N = 1 is shown. Changing peak widths and drifting baselines have little effect on the recognition ability. Difficulties with changing peak heights can be compensated by range scaling. Problems occur when two peaks are not sufficiently separated.  相似文献   

4.
基于图论的色谱指纹图谱谱峰的全局匹配   总被引:2,自引:0,他引:2  
倪力军  王国东  郭佳  张立国 《分析化学》2006,34(10):1454-1458
以色谱工作站的积分数据为基础,定义了允许匹配峰组的域,提出了在域内根据色谱峰面积、保留时间计算各匹配峰组之间距离矩阵的公式,从而形成有向图。采用图论中的最短路径算法寻找可能的匹配峰组中的最佳匹配峰组。采用优化的匹配参数,对珍菊降压片、丹参提取物、柴胡皂苷提取物、人参皂苷提取物的HPLC谱图进行匹配并将有关结果与国家药典委员会推出的指纹图谱软件自动匹配结果作了比较。结果表明:本算法可最大程度地匹配可能的色谱峰并极少出现错配、漏配峰组,无需手动校正。  相似文献   

5.
This study develops a methodology based on NIR-microscopy analysis and chemometric tools for the detection of animal protein by-products in mixtures, such as compound feeds and mixtures of ingredients, using a library of animal meal by-products only. The proposed methodology is a two-step strategy which worked better than the SIMCA approach it was compared with. In the first step, animal particles are identified using one of two methods, a global or a local distance measure. In the second, K-nearest-neighbours (KNN) is used to discriminate between terrestrial and fish particles. The models were developed using a training set comprising 11,727 spectra of pure terrestrial meals and 5843 of fish meals. KNN using second derivative spectra and five neighbours correctly classifies 98.5% of these samples under cross-validation. The procedure was validated using two external datasets, one made up of mixtures of species (fish and bovine), and a second of commercial compound feeds. The results obtained confirm that the procedure is able to reliably detect the presence of animal meals, although further work would be needed to develop it into an accurate quantitative method.  相似文献   

6.
《Vibrational Spectroscopy》2010,52(2):276-282
The combinations of NIR spectroscopy and three classification algorithms, i.e., multi-class support vector machine (BSVM), k-nearest neighbor (KNN) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of cigarettes, were explored. The influence of the training set size on the relative performance of each algorithm was also investigated. A NIR spectral dataset involving the classification of cigarettes of three brands was used for illustration. Three performance criteria based on “correctly classified rate (CCR)”, i.e., “Average CCR”, “95 percentile of CCR” and “S.D. of CCR”, were defined to compare different algorithms. It was revealed that BSVM is significantly better than KNN or SIMCA in the statistical sense, especially in cases where the training set is relatively small. The results suggest that NIR spectroscopy together with BSVM could be an alternative to traditional methods for discriminating different brands of cigarettes.  相似文献   

7.
A hybrid computational approach was employed for simulation of molecular separation using polymeric membranes. The considered system is a cylindrical membrane module in which the mass transfer equations were solved numerically using CFD (Computational Fluid Dynamics) to obtain the concentration of the species, and then the simulation results were used in machine learning models. Indeed, the CFD simulation results were used as the inputs for several machine learning models to obtain the hybrid model. We have a dataset with more than 2000 data points and two input features (r and z). Also, the only output is C which is the concentration of the species in the feed channel of membrane module. KNN (K nearest neighbor), PLSR (Partial Least Square Regression), and SGD (Stochastic Gradient Descent) are the models employed in this research to analyze the mentioned data set. Models were optimized with their hyper-parameters and finally evaluated with different statistical metrics. MAE error metric is 3.4, 5.1, and 5.5 for KNN, SGD, and PLSR. Also, they have 0.998, 0.997, 0.896 coefficient of determination (R2) respectively. Finally, based on the overall results, KNN with K = 8 is selected as the best model in this study for simulation of the membrane system. The final maximum error is also 1.35E+02.  相似文献   

8.
With the exponential growth of genome databases, the importance of phylogenetics has increased dramatically over the past years. Studying phylogenetic trees enables us not only to understand how genes, genomes, and species evolve, but also helps us predict how they might change in future. One of the crucial aspects of phylogenetics is the comparison of two or more phylogenetic trees. There are different metrics for computing the dissimilarity between a pair of trees. The Robinson-Foulds (RF) distance is one of the widely used metrics on the space of labeled trees. The distribution of the RF distance from a given tree has been studied before, but the fastest known algorithm for computing this distribution is a slow, albeit polynomial-time, O(l5) algorithm. In this paper, we modify the dynamic programming algorithm for computing the distribution of this distance for a given tree by leveraging the number-theoretic transform (NTT), and improve the running time from O(l5) to O(l3 log l), where l is the number of tips of the tree. In addition to its practical usefulness, our method represents a theoretical novelty, as it is, to our knowledge, one of the rare applications of the number-theoretic transform for solving a computational biology problem.  相似文献   

9.
We present a methodology for optimization of chromatogram alignment using a class separability measure called the Hotelling trace criterion (HTC). This metric is a multi‐class distance measure that accounts for within‐class and between‐class variation. We chose the correlation optimized warping algorithm as our alignment method and used the HTC to judge the effectiveness of the alignment based on algorithm parameters called segment length and max warp. Biodiesel feedstock samples representing classes of soy, canola, tallow, waste grease, and hybrid were used in our experiments. Fatty acid methyl esters in each biodiesel were separated using gas chromatography‐mass spectroscopy. The entire data set was baseline corrected, aligned, normalized, and mean‐centered prior to principal components (PCs) analysis. The aligned, baseline corrected data sets were used to compute a figure of merit called warping effect, while the PC‐transformed data sets were used to evaluate the HTC. The segment length and max warp parameters that maximized the warping effect and/or HTC were then determined. Scores plots of pairs of PCs, along with 95% confidence ellipses, were created and analyzed. The results demonstrated that the parameters derived from maximizing the HTC more effectively aligned the data, as evidenced by better clustering of the biodiesels in the scores plots. This behavior was robust to the number of PCs used in the computation of the HTC. We conclude that the HTC is an objective measure of alignment quality that allows for optimal class separability and can be applied to optimize other methods of chromatogram alignment. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

10.
Combination of information technology and separation sciences opens a new avenue to achieve high sample throughputs and therefore is of great interest to bypass bottlenecks in catalyst screening of parallelized reactors or using multitier well plates in reaction optimization. Multiplexing gas chromatography utilizes pseudo-random injection sequences derived from Hadamard matrices to perform rapid sample injections which gives a convoluted chromatogram containing the information of a single sample or of several samples with similar analyte composition. The conventional chromatogram is obtained by application of the Hadamard transform using the known injection sequence or in case of several samples an averaged transformed chromatogram is obtained which can be used in a Gauss–Jordan deconvolution procedure to obtain all single chromatograms of the individual samples. The performance of such a system depends on the modulation precision and on the parameters, e.g. the sequence length and modulation interval. Here we demonstrate the effects of the sequence length and modulation interval on the deconvoluted chromatogram, peak shapes and peak integration for sequences between 9-bit (511 elements) and 13-bit (8191 elements) and modulation intervals Δt between 5 s and 500 ms using a mixture of five components. It could be demonstrated that even for high-speed modulation at time intervals of 500 ms the chromatographic information is very well preserved and that the separation efficiency can be improved by very narrow sample injections. Furthermore this study shows that the relative peak areas in multiplexed chromatograms do not deviate from conventionally recorded chromatograms.  相似文献   

11.
The methodologies of asphaltenes-containing petroleum materials: saturated, aromatics, resins, asphaltenes group-type composition analysis are performed with the use of column adsorption-desorption or thin layer chromatography (TLC)-flame ionization detection under normal phase conditions with silica gel as the adsorbent. In a three-step procedure, the TLC chromatogram is developed within a decreasing distance by the mobile phase with increasing elution strength (polarity). The n-alkane used in the first step does not dissolve asphaltenes, which leads to the occlusion effect and an underestimation of the percentage of saturated hydrocarbons. In this article, the reverse order of the subsequent elution steps was proposed: the solvent polarity is simultaneously reduced and the chromatogram development distance is increased in the order dichloromethane:methanol 95:5 v/v, 3 cm; toluene, 6 cm; and n-hexane, 10 cm. It was also intentional to reduce the weight of the applied sample to 5 μg for bitumen and 2 μg for asphaltene purity testing. It should be the rule that in stepwise TLC chromatogram development, the first mobile phase is a good solvent for all testing components. The IP 469 procedure should be corrected.  相似文献   

12.
Planar chromatography is a very useful tool for analysis of wide range of different mixtures. Thanks to its possibility for rapid separation of large number of samples simultaneously, low solvent consumption and ability to analyse rough material allow to receive precise and reliable results in short time and low cost. Miniaturization of planar techniques brings a lot of advantages, such as shortening distance and time of chromatogram development, and further lowering of solvent consumption. Besides, it often allows to improve separation parameters and raise efficiency of chromatographic system. In this paper, ability of analysis of tropane alkaloids mixture from Datura Inoxia Mill. extract using conventional TLC technique with five micro TLC techniques (short distance TLC, HPTLC, UTLC, OPLC and ETLC) in maximally closed chromatographic conditions was compared in order to present abilities of micro TLC techniques in plant material analysis.  相似文献   

13.
Liquid chromatography-mass spectrometry (LC/MS) has become the method of choice for characterizing complex mixtures. These analyses often involve quantitative comparison of components in multiple samples. To achieve automated sample comparison, the components of interest must be detected and identified, and their retention times aligned and peak areas calculated. This article describes a simple pairwise iterative retention time alignment algorithm, based on the divide-and-conquer approach, for alignment of ion features detected in LC/MS experiments. In this iterative algorithm, ion features in the sample run are first aligned with features in the reference run by applying a single constant shift of retention time. The sample chromatogram is then divided into two shorter chromatograms, which are aligned to the reference chromatogram the same way. Each shorter chromatogram is further divided into even shorter chromatograms. This process continues until each chromatogram is sufficiently narrow so that ion features within it have a similar retention time shift. In six pairwise LC/MS alignment examples containing a total of 6507 confirmed true corresponding feature pairs with retention time shifts up to five peak widths, the algorithm successfully aligned these features with an error rate of 0.2%. The alignment algorithm is demonstrated to be fast, robust, fully automatic, and superior to other algorithms. After alignment and gap-filling of detected ion features, their abundances can be tabulated for direct comparison between samples.  相似文献   

14.
在法庭科学领域,轮胎橡胶颗粒的检验鉴别对交通肇事和一些诉讼案件的侦破尤为重要,针对传统取样分析技术会破坏物证的问题和综合考察样本在多变量多维度上的差异性,提出基于红外光谱法结合K近邻算法无损识别轮胎橡胶的鉴别方法。采集不同品牌的样本,对其光谱进行自动基线校正和归一化操作,采用Savitsky-Golay算法平滑去噪,通过降维实现对840个原始特征到5个识别特征的高效筛选,运用训练样本为测试样本的方法进行交互验证,选取K值为1,"特征3"为主要自变量,"特征4"、"特征5"、"特征2"和"特征1"为协变量作为分类参数,按重要性加权特征进行计算样本之间的距离,建立分类模型,模型总分类准确率达83. 56%,区分效果良好,结合样本红外谱图展开进一步分析,最终成功将73类样本分为了10类。结果表明,利用红外光谱检测和K近邻算法可实现对轮胎橡胶颗粒的识别与分类,普适性和高效性较强,具有一定的借鉴和参考意义。  相似文献   

15.
This work aimed to classify the categories (produced by different processes) and brands (obtained from different geographical origins) of Chinese soy sauces. Nine variables of physico-chemical properties (density, pH, dry matter, ashes, electric conductivity, amino nitrogen, salt, viscosity and total acidity) of 53 soy sauce samples were measured. The measured data was submitted to such pattern recognition as cluster analysis (CA), principal component analysis (PCA), discrimination partial least squares (DPLS), linear discrimination analysis (LDA) and K-nearest neighbor (KNN) to evaluate the data patterns and the possibility of differentiating Chinese soy sauces between different categories and brands. Two clusters corresponding to the two categories were obtained, and each cluster was divided into three subsets corresponding to three brands by the CA method. The variables for LDA and KNN were selected by the Fisher F-ratio approach. The prediction ability of all classifiers was evaluated by cross-validation. For the three supervised discrimination analyses, LDA and KNN gave 100% predications according to the sample category and brand.  相似文献   

16.
Nearly all enzymes are proteins. They are the biological catalysts that accelerate the function of cellular reactions. Because of different characteristics of reaction tasks, they split into six classes: oxidoreductases (EC-1), transferases (EC-2), hydrolases (EC-3), lyases (EC-4), isomerases (EC-5), ligases (EC-6). Prediction of enzyme classes is of great importance in identifying which enzyme class is a member of a protein. Since the enzyme sequences increase day by day, contrary to experimental analysis in prediction of enzyme classes for a newly found enzyme sequence, providing from data mining techniques becomes very useful and time-saving.In this paper, two kinds of simple minimum distance-based classifier methods have been proposed. These methods and known K-nearest neighbor (KNN) classification algorithm have been performed in order to classify enzymes according to their amino acid composition. Performance measurements and elapsed time to execute algorithms have been compared. In addition, equality of two proposed approaches under special condition has been proved in order to be a guide for researchers.  相似文献   

17.
A new approach for target quantitative analysis for comprehensive two-dimensional gas chromatography (GC × GC), interval Multi-way Partial Least Square (iNPLS) is presented and evaluated in this paper. In iNPLS, the two-dimensional chromatogram is split in small sections; each of these pieces is treated as an independent new chromatogram. Separated conventional NPLS calibration models for the concentration of the target analyte are built for each of the pieces of the whole chromatogram, and the best model is selected for quantitative analysis. An algorithm for iNPLS running on MatLab platform was written, preliminarily evaluated with using solutions of model compounds with different chemical properties and subsequently applied to quantify some allergens in perfume samples. The results were found to be adequate, and good precision and accuracy was obtained even for poorly resolved peaks.  相似文献   

18.
In order to separate a high‐performance liquid chromatography with diode array detector (HPLC‐DAD) data set to chromatogram peaks and spectra for all compounds, a separation method based on the model of generalized Gaussian reference curve measurement (GGRCM) and the algorithm of multi‐target intermittent particle swarm optimization (MIPSO) is proposed in this paper. A parameter θ is constructed to generate a reference curve r(θ) for a chromatogram peak based on its physical principle. The GGRCM model is proposed to calculate the fitness ε(θ) for every θ, which indicates the possibility for the HPLC‐DAD data set to contain a chromatogram peak similar to the r(θ). The smaller the fitness is, the higher the possibility. The algorithm of MIPSO is then introduced to calculate the optimal parameters by minimizing the fitness mentioned earlier. Finally, chromatogram peaks are constructed based on these optimal parameters, and the spectra are calculated by an estimator. Through the simulations and experiments, the following conclusions are drawn: (i) the GGRCM‐MIPSO method can extract chromatogram peaks from simulation data set without knowing the number of the compounds in advance even when a severe overlap and white noise exist and (ii) the GGRCM‐MIPSO method can be applied to the real HPLC‐DAD data set. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

19.
Ginseng is one of the most important traditional Chinese medicines and functional foods.A method for the fast determination of amino acids in ginseng samples using high performance liquid chromatography(HPLC) was developed,in which strong isocratic elution was employed for simplifying the separation and speeding up the analysis.All amino acids were eluted within 3 min with the chromatogram composed of overlapped peaks from the interferences.Then,non-negative immune algorithm(NNIA) was adopted to resolve the chromatographic signals of the components from the chromatogram measured.The results show that the signals of the amino acids can be correctly extracted by NNIA and the signal extracted can be used for the quantitative analysis.The method was validated via determining six amino acids of four different samples of ginseng.The recoveries of the spiked samples are in a range of 96.6%-106.3%.  相似文献   

20.
Metabolic dataset can provide an overview of different herbal origin, which is conducted by some statistical procedures. Such results often deviate to a certain degree, due to peaks shifts in chromatographic signals. In order to solve this problem, an improved algorithm of combining sub‐window factor analysis with the mass spectrum information is proposed. The algorithm uses a peak detection approach derived either from multi‐scale Gaussian function or Haar wavelet to locate the peaks with different application scope; the candidate drift points at each peak are estimated by Fast Fourier transform cross correlation; Specifically, the best drift points at each candidate peaks are confirmed by sub‐window factor analysis and mass spectrum information in nontargeted metabolic profiling. Finally, the peak regions were aligned against a reference chromatogram, and the non‐peak regions were used linear interpolation. The chromatographic signals of 30 Bupleurum samples were aligned as an illustration of this algorithm, and they could be well distinguished using some statistical procedures. The result demonstrates that the presented method is stronger than other mass‐spectra based algorithms, when facing the alignment of some co‐eluted peaks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号