首页 | 本学科首页   官方微博 | 高级检索  
     


Metabolomics data exploration guided by prior knowledge
Authors:Robert A. van den Berg,Carina M. Rubingh,Marië  t J. van der Werf
Affiliation:a TNO Quality of Life, P.O. Box 360, 3700 AJ Zeist, The Netherlands
b SymBioSys, Katholieke Universiteit Leuven, Tiensestraat 102, 3000 Leuven, Belgium
c Biosystems Data Analysis, Universiteit van Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, the Netherlands
Abstract:In metabolomics research, it is often important to focus the data analysis to specific areas of interest within the metabolome. In this paper, we describe the application of consensus principal component analysis (CPCA) and canonical correlation analysis (CCA) as a means to explore the relation between metabolome data and (i) biochemically related metabolites and (ii) an amino acid biosynthesis pathway. CPCA searches for major trends in the behavior of metabolite concentrations that are in common for the metabolites of interest and the remainder of the metabolome. CCA identifies the strongest correlations between the metabolites of interest and the remainder of the metabolome.CPCA and CCA were applied to two different microbial metabolomics data sets. The first data set, derived from Pseudomonas putida S12, was relatively simple as it contained metabolomes obtained under four environmental conditions only. The second data set, obtained from Escherichia coli, was much more complex as it consisted of metabolomes obtained under 28 different environmental conditions. In case of the simple and coherent P. putida S12 data set, CCA and CPCA gave similar results as the variation in the subset of the selected metabolites and the remainder of the metabolome was similar.In contrast, CCA and CPCA yielded different results in case of the E. coli data set. With CPCA the trends in the selected subset - the phenylalanine biosynthesis pathway - dominated the results. The main trends were related to high and low phenylalanine productivity, and the metabolites showing a similar behavior in concentration were metabolites regulating the phenylalanine biosynthesis route in the subset and metabolites related to general amino acid metabolism in the remainder of the metabolome. With CCA, neither subset truly dominated the data analysis. CCA described the differences between the wild type and the overproducing strain and the differences between the succinate and glucose grown cells. For the difference between the wild type and the overproducing strain, metabolites from the beginning and the end of aromatic amino acid pathways like erythrose-4-phosphate, tryptophan, and phenylalanine were important for the selected metabolites.CCA and CPCA proved to be complementary data analysis tools that enable the focusing of the data analysis on groups of metabolites that are of specific interest in relation to the remainder of the metabolome. Compared to an ordinary PCA, focusing the data analysis on biologically relevant metabolites lead especially for the complex E. coli data to a better biological interpretation of the data.
Keywords:Metabolomics   Microbiology   Multiblock analysis   Principal component analysis   Canonical correlation analysis
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号