首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到12条相似文献,搜索用时 0 毫秒
1.
The development and the validation of innovative approaches for biomarker selection are of paramount importance in many ‐omics technologies. Unfortunately, the actual testing of new methods on real data is difficult, because in real data sets, one can never be sure about the “true” biomarkers. In this paper, we present a publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike‐in data set for apples. The data set consists of 10 control samples and three spiked sets of the same size, where naturally occurring compounds are added in different concentrations. In this sense, the data set can serve as a test bed to assess the performance of new algorithms and compare them with previously published results. We illustrate some of the possibilities provided by this spike‐in data set by comparing the performance of two popular biomarker‐selection methods, the univariate t‐test and the multivariate variable importance in projection. To promote a widespread use of the data, raw data files as well as preprocessed peak lists are made available. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
When quantifying information in metabolomics, the results are often expressed as data carrying only relative information. Vectors of these data have positive components, and the only relevant information is contained in the ratios between their parts; such observations are called compositional data. The aim of the paper is to demonstrate how partial least squares discriminant analysis (PLS‐DA)—the most widely used method in chemometrics for multivariate classification—can be applied to compositional data. Theoretical arguments are provided, and data sets from metabolomics are investigated. The data are related to the diagnosis of inherited metabolic disorders (IMDs). The first example analyzes the significance of the corresponding regression parameters (metabolites) using a small data set resulting from targeted metabolomics, where just a subset of potential markers is selected. The second example—the approach of untargeted metabolomics—was used for the analysis detecting almost 500 metabolites. The significance of the metabolites is investigated by applying PLS‐DA, accommodated according to a compositional approach. The significance of important metabolites (markers of diseases) is more clearly visible with the compositional method in both examples. Also, cross‐validation methods lead to better results in case of using the compositional approach. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.  相似文献   

4.
The mass spectrometry‐based molecular profiling can be used for better differentiation between normal and cancer tissues and for the detection of neoplastic transformation, which is of great importance for diagnostics of a pathology, prognosis of its evolution trend, and development of a treatment strategy. The aim of the present study is the evaluation of tissue classification approaches based on various data sets derived from the molecular profile of the organic solvent extracts of a tissue. A set of possibilities are considered for the orthogonal projections to latent structures discriminant analysis: all mass spectrometric peaks over 300 counts threshold, subset of peaks selected by ranking with support vector machine algorithm, peaks selected by random forest algorithm, peaks with the statistically significant difference of the intensity determined by the Mann‐Whitney U test, peaks identified as lipids, and both identified and significantly different peaks. The best predictive potential is obtained for OPLS‐DA model built on nonpolar glycerolipids (Q2 = 0.64, area under curve [AUC] = 0.95); the second one is OPLS‐DA model with lipid peaks selected by random forest algorithm (Q2 = 0.58, AUC = 0.87). Moreover, models based on particular molecular classes are more preferable from biological point of view, resulting in new explanatory mechanisms of pathophysiology and providing a pathway analysis. Another promising features for OPLS‐DA modeling are phosphatidylethanolamines (Q2 = 0.48, AUC = 0.86).  相似文献   

5.
The two-dimensional linear discriminant analysis (2D-LDA) algorithm was originally proposed in the context of face image processing for the extraction of features with maximal discriminant power. However, despite its promising performance in image processing tasks, the 2D-LDA algorithm has not yet been used in applications involving chemical data. The present paper bridges this gap by investigating the use of 2D-LDA in classification problems involving three-way spectral data. The investigation was concerned with simulated data, as well as real-life data sets involving the classification of dry-cured Parma ham according to ageing by surface autofluorescence spectrometry and the classification of edible vegetable oils according to feedstock using total synchronous fluorescence spectrometry. The results were compared with those obtained by using the spectral data with no feature extraction, U-PLS-DA (Partial Least Squares Discriminant Analysis applied to the unfolded data), and LDA employing TUCKER-3 or PARAFAC scores. In the simulated data set, all methods yielded a correct classification rate of 100%. However, in the Parma ham and vegetable oil data sets, better classification rates were obtained by using 2D-LDA (86% and 100%), compared with no feature extraction (76% and 77%), U-PLS-DA (81% and 92%), PARAFAC-LDA (76% and 86%) and TUCKER3-LDA (86% and 93%).  相似文献   

6.
Conventional tumor markers are unsuitable for detecting carcinoma at an early stage and lack clinical efficacy and utility. In this study, we attempted to investigate the differences in serum metabolite profiles of gastrointestinal cancers and healthy volunteers using a metabolomic approach and searched for sensitive and specific metabolomic biomarker candidates. Human serum samples were obtained esophageal (n = 15), gastric (n = 11), and colorectal (n = 12) cancer patients and healthy volunteers (n = 12). A model for evaluating metabolomic biomarker candidates was constructed using multiple classification analysis, and the results were assessed with receiver operating characteristic curves. Among the 58 metabolites, the levels of nine, five and 12 metabolites were significantly changed in the esophageal, gastric and colorectal cancer patients, respectively, compared with the healthy volunteers. Multiple classification analysis revealed that the variations in the levels of malonic acid and l ‐serine largely contributed to the separation of esophageal cancer; gastric cancer was characterized by changes in the levels of 3‐hydroxypropionic acid and pyruvic acid; and l ‐alanine, glucuronoic lactone and l ‐glutamine contributed to the separation of colorectal cancer. Our approach revealed that some metabolites are more sensitive for detecting gastrointestinal cancer than conventional biomarkers. Our study supports the potential of metabolomics as an early diagnostic tool for cancer. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

7.
2D gel electrophoresis is a tool for measuring protein regulation, involving image analysis by dedicated software (PDQuest, Melanie, etc.). Here, partial least squares discriminant analysis was applied to improve the results obtained by classic image analysis and to identify the significant spots responsible for the differences between two datasets. A human colon cancer HCT116 cell line was analyzed, treated and not treated with a new histone deacetylase inhibitor, RC307. The proteins regulated by RC307 were detected by analyzing the total lysates and nuclear proteome profiles. Some of the regulated spots were identified by tandem mass spectrometry. The preliminary data are encouraging and the protein modulation reported is consistent with the antitumoral effect of RC307 on the HCT116 cell line. Partial least squares discriminant analysis coupled with backward elimination variable selection allowed the identification of a larger number of spots than classic PDQuest analysis. Moreover, it allows the achievement of the best performances of the model in terms of prediction and provides therefore more robust and reliable results. From this point of view, the multivariate procedure applied can be considered a good alternative to standard differential analysis, also taking into account the interdependencies existing among the variables.  相似文献   

8.
A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

9.
By the study of 3000 kinetic runs for all homogeneous two-step models under variation of activation and signal parameters, it has been stated that the generated Mechanistic Concentration Code (=MCC) is the best vehicle for data extraction in Thermal Analysis. It summarises the rate-controlling steps and their molecularities, independently of their activation data and (to 80–90%) of their method-specific signal parameters. Hence, an optimum evaluation needs the internal (best-fitted) reference step, the initial concentration of a reference reactant, and equal weight of theoretical and experimental results, reached using the same algorithms. Thus, the MCC of any series measured in general allows for a reliable model determination via the distribution into all two-step models, using the tools of probability and decision theory. A transfer of the strategy to heterogeneous reactions is discussed.
Zusammenfassung Durch die Untersuchung von über 3000 kinetischen Abläufen für alle homogenkinetischen Zweistufen-Modelle wurde festgestellt, da\ der Mechanistische Konzentrations-Code (= MCC) ein optimaler Datenträger in der Thermischen Analyse ist: Er beschreibt die geschwindigkeitsbestimmenden Schritte und ihre Molekularität, unab- hängig von deren Aktivierungsdaten und, zu 80–90%, von ihren methodenspezifischen Signalparametern. Ein optimales Auswerteverfahren benötigt einen internen (optimal angepa\ten) Referenzschritt, die Startkonzentration eines Referenz-Reaktanten und Gleichberechtigung theoretischer und experimenteller Befunde, erreicht, durch Verwendung derselben Algorithmen. Allgemein ermöglicht der MCC aus einer Reihe von Experimenten dann eine Modellbestimmung über eine Verteilung, die durch entscheidungstheoretische Kriterien die Wahrscheinlichkeiten aller Zweistufenmodelle auflistet.Eine übertragung des Verfahrens auf heterogene Prozesse wird diskutiert.
  相似文献   

10.
A high-resolution HILIC-MS/MS method was developed to analyze anthranilic acid derivatives of N-glycans released from human serum alpha-1-acid glycoprotein (AGP). The method was applied to samples obtained from 18 patients suffering from high-risk malignant melanoma as well as 19 healthy individuals. It enabled the identification of 102 glycan isomers separating isomers that differ only in sialic acid linkage (α-2,3, α-2,6) or in fucose positions (core, antenna). Comparative assessment of the samples revealed that upregulation of certain fucosylated glycans and downregulation of their nonfucosylated counterparts occurred in cancer patients. An increased ratio of isomers with more α-2,6-linked sialic acids was also observed. Linear discriminant analysis (LDA) combining 10 variables with the highest discriminatory power was employed to categorize the samples based on their glycosylation pattern. The performance of the method was tested by cross-validation, resulting in an overall classification success rate of 96.7%. The approach presented here is significantly superior to serological marker S100B protein in terms of sensitivity and negative predictive power in the population studied. Therefore, it may effectively support the diagnosis of malignant melanoma as a biomarker.  相似文献   

11.
Parallel factor analysis (PARAFAC) has successfully been used in many applications for the analysis of excitation-emission fluorescence data. However, some measurement “artefacts”, such as Rayleigh or Raman scattering, can pose a problem for the extraction of the PARAFAC components and their interpretation. Replacing the spectral zones corresponding to these signals by missing values in the data is not necessarily a method of choice in the cases where informative signals lie in the same wavelength regions. In this article, independent component analysis (ICA) is used on the unfolded cubic array, and the independent components related to the Rayleigh and Raman scattering are identified and removed prior to the reconstruction of the excitation-emission fluorescence data cube. PARAFAC is then applied on these data reconstructed after selective artefact removal, and satisfactory models can be obtained. This procedure, although particularly useful for 3D fluorescence data, may be applied to other types of data as well.  相似文献   

12.
An enhanced pseudotargeted method using a segment data‐dependent acquisition mode based on ultra‐high performance liquid chromatography–tandem mass spectrometry was developed. This segment data dependent acquisition‐based pseudotargeted method could improve the detection of co‐eluted ions and extend the coverage of analytes. A set of 502 multiple reaction monitoring channels were obtained by this segment strategy, which was twice the number created by the traditional data‐dependent acquisition mode. Compared with the untargeted method, the pseudotargeted profiling demonstrated higher sensitivity and higher precision. More than 90% of the metabolites detected by the enhanced pseudotargeted method had relative standard deviations less than 15%. The segment data dependent acquisition‐based pseudotargeted method was successfully applied to the metabolomics study of the depressed rats with the treatment of liquiritin. Forty‐seven differential metabolites were screened and five metabolic pathways were found to be related to depression including retinol metabolism, phenylalanine, tyrosine, and tryptophan biosynthesis, phenylalanine metabolism, terpenoid backbone biosynthesis, and lysine degradation. The segment data dependent acquisition‐based pseudotargeted method widened the coverage of metabolites with good sensitivity and precision, which exhibited great potential in the discovery of differential metabolites in metabolomics studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号