首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There are many algorithms for detecting epistatic interactions in GWAS. However, most of these algorithms are applicable only for detecting two-locus interactions. Some algorithms are designed to detect only two-locus interactions from the beginning. Others do not have limits to the order of interactions, but in practice take very long time to detect higher order interactions in real data of GWAS. Even the better ones take days to detect higher order interactions in WTCCC data.We propose a fast algorithm for detection of high order epistatic interactions in GWAS. It runs k-means clustering algorithm on the set of all SNPs. Then candidates are selected from each cluster. These candidates are examined to find the causative SNPs of k-locus interactions. We use mutual information from information theory as the measure of association between genotypes and phenotypes.We tested the power and speed of our method on extensive sets of simulated data. The results show that our method has more or equal power, and runs much faster than previously reported methods. We also applied our algorithm on each of seven diseases in WTCCC data to analyze up to 5-locus interactions. It takes only a few hours to analyze 5-locus interactions in one dataset. From the results we make some interesting and meaningful observations on each disease in WTCCC data.In this study, a simple yet powerful two-step approach is proposed for fast detection of high order epistatic interaction. Our algorithm makes it possible to detect high order epistatic interactions in GWAS in a matter of hours on a PC.  相似文献   

2.
Summary A pattern recognition methodology has been developed for analysis of chromatographic data. The method uses a new class of multidimensional orthogonal polynomials developed by Cohen in conjunction with a supervised learning technique. The method is applicable to any chromatographic data for which classification into two or more categories is desired. The algorithm analyzes both elution times and peak areas. An application is shown for the analysis of organic acids in ascitic fluid obtained from patients with liver disorders. Classification of these patients for presence or absence of bacterial infection shows over ninety percent correct classification.  相似文献   

3.
A signal-processing method known as spectral correlative chromatography (SCC) for two-dimensional data obtained from hyphenated chromatography is developed and applied to chemical chromatographic fingerprint data sets of herbal medicine under specific experimental conditions. The method can judge the presence or absence of a spectral correlative peak among the spectrochromatograms. A local least squares regression model (LLS) is constructed in a piecewise manner to correct the shifts of retention time of some peaks of interest in the chromatograms of various test samples. The results compare favorably with those obtained by a two-point calibrated algorithm. It is shown that performing SCC and LLS on the piecewise clusters of various chromatographic fingerprints is more helpful in practice in revealing their common nature and for characterizing the chemical constituents. This approach holds great potential for facilitating quality control of herbal medicines.  相似文献   

4.
贾梦涵  回朝妍  张辉  高宇  佟美琪  马仡男 《色谱》2021,39(6):670-677
谱峰的检测分析在色谱技术研究中具有十分重要的作用,但在色谱数据采集、传输的过程中,不同程度的噪声干扰给谱峰检测带来了极大的困难.目前传统的谱峰检测算法普遍通过基底扣除的方式对谱峰的形态进行预定义,将谱峰分为单峰、重叠峰等多个种类.针对不同种类的谱峰采用不同的检测方法,这就导致了传统的谱峰检测算法具有高复杂度、低自动化程...  相似文献   

5.
Metabolic dataset can provide an overview of different herbal origin, which is conducted by some statistical procedures. Such results often deviate to a certain degree, due to peaks shifts in chromatographic signals. In order to solve this problem, an improved algorithm of combining sub‐window factor analysis with the mass spectrum information is proposed. The algorithm uses a peak detection approach derived either from multi‐scale Gaussian function or Haar wavelet to locate the peaks with different application scope; the candidate drift points at each peak are estimated by Fast Fourier transform cross correlation; Specifically, the best drift points at each candidate peaks are confirmed by sub‐window factor analysis and mass spectrum information in nontargeted metabolic profiling. Finally, the peak regions were aligned against a reference chromatogram, and the non‐peak regions were used linear interpolation. The chromatographic signals of 30 Bupleurum samples were aligned as an illustration of this algorithm, and they could be well distinguished using some statistical procedures. The result demonstrates that the presented method is stronger than other mass‐spectra based algorithms, when facing the alignment of some co‐eluted peaks.  相似文献   

6.
The effect of filtering and automated integration of chromatographic data on the calibration curve and detection limit was assessed. In a first approach simulated chromatograms were used to quantify the effects of data processing. Three types of filters were used: Savitzky-Golay, Fourier and Wavelet filter. The filter parameters chosen have been optimized in a previous study. The simulated data have been integrated by a commercial software package. The use of applying the DIN 32645 concept for the determination of the detection limit of chromatographic data is discussed and opposed to the concept of the method detection limit. Under the conditions investigated, filtering can improve the limit of detection up to a factor of three. This can be explained by the fact that filtering reduces the variance of the peak area and height and the limit of detection is mainly determined by their variance. However, the integration algorithm practically limits the possible improvements by filtering the data.  相似文献   

7.
As a potential tool for amplifying weak chromatographic peaks, the stochastic resonance algorithm was developed based upon a counterintuitive physical phenomenon. Therefore, the essential step, parameter optimization, was perplexing and difficult for analysts. In order to avoid optimizing the system parameters on a case‐by‐case basis, an improved algorithm was proposed by introducing a constant or direct current signal into the signal to be measured as the external force. The weak chromatographic peak can be amplified and detected by the new algorithm using the same set of parameters. Two sets of our previous experimental data were reanalyzed by using the developed algorithm and the results were satisfactory. A generalized solution was expected to come into being on account of the new algorithm.  相似文献   

8.
An experimental approach for rapid analysis and convenient interpretation of multiparallel experiments is described. Conventional approaches use a series of individual chromatographic runs to produce integrated peak area data, which are stored in individual data files, then transferred to a spreadsheet program and graphed to allow interpretation of experimental results. A simpler and more direct approach utilizes multiple injections within a single chromatographic run to produce a continuous trace of chromatograms, which can often provide a direct visual readout of experimental outcome without the need for peak integration, data transfer, or graphing. In this approach, the chromatogram itself serves as the graph whereby the outcome of the multiparallel experiment can be discerned. The utility of the technique is greatly enhanced by the use of compound-specific detection technologies such as mass spectrometry or chiroptical spectroscopy, and can benefit from experimental designs that facilitate the direct interpretation of results.  相似文献   

9.
Data-independent mass spectrometry activates all ion species isolated within a given mass-to-charge window (m/z) regardless of their abundance. This acquisition strategy overcomes the traditional data-dependent ion selection boosting data reproducibility and sensitivity. However, several tandem mass (MS/MS) spectra of the same precursor ion are acquired during chromatographic elution resulting in large data redundancy. Also, the significant number of chimeric spectra and the absence of accurate precursor ion masses hamper peptide identification. Here, we describe an algorithm to preprocess data-independent MS/MS spectra by filtering out noise peaks and clustering the spectra according to both the chromatographic elution profiles and the spectral similarity. In addition, we developed an approach to estimate the m/z value of precursor ions from clustered MS/MS spectra in order to improve database search performance. Data acquired using a small 3 m/z units precursor mass window and multiple injections to cover a m/z range of 400–1400 was processed with our algorithm. It showed an improvement in the number of both peptide and protein identifications by 8 % while reducing the number of submitted spectra by 18 % and the number of peaks by 55 %. We conclude that our clustering method is a valid approach for data analysis of these data-independent fragmentation spectra. The software including the source code is available for the scientific community.
Figure
?  相似文献   

10.
11.
This work investigates the ability of multiplicative (on the basis of product units) and sigmoidal neural models built by an evolutionary algorithm to quantify highly overlapping chromatographic peaks. To test this approach, two N-methylcarbamate pesticides, carbofuran and propoxur, were quantified using a classic peroxyoxalate chemiluminescence reaction as a detection system for chromatographic analysis. The four-parameter Weibull curve associated with the profile of the chromatographic peak estimated by the Levenberg-Marquardt method was used as input data for both models. Straightforward network topologies (one output) allowed the analytes to be quantified with great accuracy and precision. Product unit neural networks provided better information ability, smaller network architectures, and more robust models (smaller standard deviation). The reduced dimensions of the selected models enabled the derivation of simple quantification equations to transform the input variables into the output variable. These equations can be more easily interpreted from a chemical point of view than those provided by sigmoidal neural networks, and the effect of both analytes on the characteristics of chromatographic bands, namely profile, dispersion, peak height, and residence time, can be readily established.  相似文献   

12.
Mass Spectrometry (MS) is a powerful technique for the determination of glycan structures and is capable of providing qualitative and quantitative information. Recent development in computational method offers an opportunity to use glycan structure databases and de novo algorithms for extracting valuable information from MS or MS/MS data. However, detecting low-intensity peaks that are buried in noisy data sets is still a challenge and an algorithm for accurate prediction and annotation of glycan structures from MS data is highly desirable. The present study describes a novel algorithm for glycan structure prediction by matching glycan isotope abundance (mGIA), which takes isotope masses, abundances, and spacing into account. We constructed a comprehensive database containing 808 glycan compositions and their corresponding isotope abundance. Unlike most previously reported methods, not only did we take into count the m/z values of the peaks but also their corresponding logarithmic Euclidean distance of the calculated and detected isotope vectors. Evaluation against a linear classifier, obtained by training mGIA algorithm with datasets of three different human tissue samples from Consortium for Functional Glycomics (CFG) in association with Support Vector Machine (SVM), was proposed to improve the accuracy of automatic glycan structure annotation. In addition, an effective data preprocessing procedure, including baseline subtraction, smoothing, peak centroiding and composition matching for extracting correct isotope profiles from MS data was incorporated. The algorithm was validated by analyzing the mouse kidney MS data from CFG, resulting in the identification of 6 more glycan compositions than the previous annotation and significant improvement of detection of weaker peaks compared with the algorithm previously reported.  相似文献   

13.
There is a fundamental difference between data collected in comprehensive two-dimensional gas chromatographic (GCxGC) separations and data collected by one-dimensional GC techniques (or heart-cut GC techniques). This difference can be ascribed to the fact that GCxGC generates multiple sub-peaks for each analyte, as opposed to other GC techniques that generate only a single chromatographic peak for each analyte. In order to calculate the total signal for the analyte, the most commonly used approach is to consider the cumulative area that results from the integration of each sub-peak. Alternately, the data may be considered using higher order techniques such as the generalized rank annihilation method (GRAM). Regardless of the approach, the potential errors are expected to be greater for trace analytes where the sub-peaks are close to the limit of detection (LOD). This error is also expected to be compounded with phase-induced error, a phenomenon foreign to the measurement of single peaks. Here these sources of error are investigated for the first time using both the traditional integration-based approach and GRAM analysis. The use of simulated data permits the sources of error to be controlled and independently evaluated in a manner not possible with real data. The results of this study show that the error introduced by the modulation process is at worst 1% for analyte signals with a base peak height of 10xLOD and either approach to quantitation is used. Errors due to phase shifting are shown to be of greater concern, especially for trace analytes with only one or two visible sub-peaks. In this case, the error could be as great as 6.4% for symmetrical peaks when a conventional integration approach is used. This is contrasted by GRAM which provides a much more precise result, at worst 1.8% and 0.6% when the modulation ratio (MR) is 1.5 or 3.0, respectively for symmetrical peaks. The data show that for analyses demanding high precision, a MR of 3 should be targeted as a minimum, especially if multivariate techniques are to be used so as to maintain data density in the primary dimension. For rapid screening techniques where precision is not as critical lower MR values can be tolerated. When integration is used, if there are 4-5 visible sub-peaks included for a symmetrical peak at MR=3.0, the data will be reasonably free from phase-shift-induced errors or a negative bias. At MR=1.5, at least 3 sub-peaks must be included for a symmetrical peak. The proposed guidelines should be equally relevant to LCxLC and other similar techniques.  相似文献   

14.
Recently, chromatographic fingerprinting has become one of the most powerful approaches to quality control of herbal medicines. However, the performance of reported chromatographic fingerprinting constructed by single chromatogram sometimes turns out to be inadequate for complex herbal medicines, such as multi-herb botanical drug products. In this study, multiple chromatographic fingerprinting, which consists of more than one chromatographic fingerprint and represents the whole characteristics of chemical constitutions of the complex medicine, is proposed as a potential strategy in this complicated case. As a typical example, a binary chromatographic fingerprinting of “Danshen Dropping Pill” (DSDP), the best-sold traditional Chinese medicine in China, was developed. First, two HPLC fingerprints that, respectively, represent chemical characteristics of depsides and saponins of DSDP were developed, which were used to construct binary chromatographic fingerprints of DSDP. Moreover, the authentication and validation of the binary fingerprints were performed. Then, a data-level information fusion method was employed to capture the chemical information encoded in two chromatographic fingerprints. Based on the fusion results, the lot-to-lot consistency and frauds can be determined either using similarity measure or by chemometrics approach. The application of binary chromatographic fingerprinting to consistency assessment and frauds detection of DSDP clearly demonstrated that the proposed method was a powerful approach to quality control of complex herbal medicines.  相似文献   

15.
Hyphenated techniques such as gas chromatography–mass spectrometry (GC–MS) or high-performance liquid chromatography–mass spectrometry (LC–MS) produce a large amount of data in a form of two-way data matrix. It has been a great challenge to furthest extract the useful information from the data. In this work, a chemometric approach based on a modification of adaptive immune algorithm (AIA) was proposed for a high-throughput analysis of the multicomponent overlapping GC–MS signals. With the proposed method, the chromatographic profile of each component in an overlapping signal can be extracted independently and sequentially along the retention time. In order to show the efficiency of the method, a stimulated GC–MS data of six components with background and an experimental GC–MS data of 40 pesticides were investigated. It was found that the multicomponent overlapping GC–MS signals could be fast and accurately resolved. Furthermore, the quantitative property of the extracted information was also investigated. The correlation coefficients (r) between the peak area and the added volumes of the sample are in the range 0.9658–0.9953.  相似文献   

16.
Chromatographic detection responses are recorded digitally. A peak is represented ideally by a Guassian distribution. Raising a Guassian distribution to the power ‘n’ increases the height of the peak to that power, but decreases the standard deviation by √n. Hence there is an increasing disparity in detection responses as the signal moves from low level noise, with a corresponding decrease in peak width. This increases the S/N ratio and increases peak to peak resolution. The ramifications of these factors are that poor resolution in complex chromatographic data can be improved, and low signal responses embedded at near noise levels can be enhanced. The application of this data treatment process is potentially very useful in 2D-HPLC where sample dilution occurs between dimension, reducing signal response, and in the application of post-reaction detection methods, where band broadening is increased by virtue of reaction coils. In this work power functions applied to chromatographic data are discussed in the context of (a) complex separation problems, (b) 2D-HPLC separations, and (c) post-column reaction detectors.  相似文献   

17.
A method is proposed for the determination of chromatographic peak purity by means of principal component analysis (PCA) of high-performance liquid chromatography with diode array detection (HPLC-DAD) data. The method is exemplified with analysis of binary mixtures of lidocaine and prilocaine with different levels of separation. Lidocaine and prilocaine have very similar spectra and the chromatograms used had substantial peak overlap. The samples analysed contained a constant amount of lidocaine and a minor amount of prilocaine (0.02-2 conc.%) and hence the focus was on determining the purity of the lidocaine peak in the presence of much smaller levels of prilocaine. The peak purity determination was made by examination of relative observation residuals, scores and loadings from the PCA decomposition of DAD data over a chromatographic peak. As a reference method, the functions for peak purity analysis in the chromatographic data system used (Chromeleon) were applied. The PCA method showed good results at the same level as the detection limit of baseline-separated prilocaine, outperforming the methods in Chromeleon by a factor of ten. There is a discussion of the interpretation of the result, with some comparisons with evolving factor analysis (EFA). The main advantage of the PCA method for determination of peak purity over methods like EFA lies in its simplicity, short time of calculation and ease of use.  相似文献   

18.
A new strategy is reported for extracting complete and partial sequence information from collision-induced dissociation (CID) spectra of peptides, CID spectra are obtained from high energy CID of peptide molecular ions on a four-sector tandem mass spectrometer with an electro-optically coupled microchannel array detector, A peak detection routine reduces the spectrum to a list of peak masses and peak heights, which is then used for sequencing, The sequencing algorithm was designed to use spectral data to generate sequence fits directly rather than to use data to test the fit of series of sequence guesses. The peptide sequencing algorithm uses a pattern based on the polymeric nature of peptides to classify spectral peaks into sets that are related in a sequence-independent manner, It then establishes sequence relationships among these sets, Peak detection from raw data takes 10–20 s, with sequence generation requiring an additional 10–60 s on a Sun 3/60 workstation, The program is written in the C language to run on a Unix platform. The principal advantages of our method are in the speed of analysis and the potential for identifying modified or rare amino acids. The algorithm was designed to permit real-time sequencing but awaits hardware modifications to allow real-time access to CID spectra.  相似文献   

19.
Modern chromatographic data acquisition softwares often behave as black boxes where the researchers have little control over the raw data processing. One of the significant interests of separation scientists is to extract physico‐chemical information from chromatographic experiments and peak parameters. In addition, column developers need the total peak shape analysis to characterize the flow profile in chromatographic beds. Statistical moments offer a robust approach for providing detailed information for peaks in terms of area, its center of gravity, variance, resolution, and its skew without assuming any peak model or shape. Despite their utility and theoretical significance, statistical moments are rarely incorporated as they often provide underestimated or overestimated results because of inappropriate choice of the integration method and selection of integration limits. The Gaussian model is universally used in most chromatography softwares to assess efficiency, resolution, and peak position. Herein we present a user‐friendly, and accessible approach for calculating the zeroth, first, second, and third moments through more accurate numerical integration techniques (Trapezoidal and Simpson's rule) which provide an accurate estimate of peak parameters as compared to rectangular integration. An Excel template is also provided which can calculate the four moments in three steps with or without baseline correction.  相似文献   

20.
A novel approach for CE data analysis based on pattern recognition techniques in the wavelet domain is presented. Low-resolution, denoised electropherograms are obtained by applying several preprocessing algorithms including denoising, baseline correction, and detection of the region of interest in the wavelet domain. The resultant signals are mapped into character sequences using first derivative information and multilevel peak height quantization. Next, a local alignment algorithm is applied on the coded sequences for peak pattern recognition. We also propose 2-D and 3-D representations of the found patterns for fast visual evaluation of the variability of chemical substances concentration in the analyzed samples. The proposed approach is tested on the analysis of intracerebral microdialysate data obtained by CE and LIF detection, achieving a correct detection rate of about 85% with a processing time of less than 0.3 s per 25,000-point electropherogram. Using a local alignment algorithm on low-resolution denoised electropherograms might have a great impact on high-throughput CE since the proposed methodology will substitute automatic fast pattern recognition analysis for slow, human based time-consuming visual pattern recognition methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号