首页 | 本学科首页   官方微博 | 高级检索  
     


Multilevel component analysis of time-resolved metabolic fingerprinting data
Authors:Jeroen J. Jansen   Huub C.J. Hoefsloot   Jan van der Greef   Marieke E. Timmerman  Age K. Smilde  
Affiliation:

aBiosystems Data Analysis, Faculty of Sciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands

bTNO Nutrition and Food Research, PO Box 360, 3700 AJ Zeist, The Netherlands

cBeyond Genomics, 40 Bear Hill Road, Waltham, MA 02451, USA

dHeymans Institute of Psychology, DPMG, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands

Abstract:Genomics-based technologies in systems biology have gained a lot of popularity in recent years. These technologies generate large amounts of data. To obtain information from this data, multivariate data analysis methods are required. Many of the datasets generated in genomics are multilevel datasets, in which the variation occurs on different levels simultaneously (e.g. variation between organisms and variation in time). We introduce multilevel component analysis (MCA) into the field of metabolic fingerprinting to separate these different types of variation. This is in contrast to the commonly used principal component analysis (PCA) that is not capable of doing this: in a PCA model the different types of variation in a multilevel dataset are confounded.

MCA generates different submodels for different types of variation. These submodels are lower-dimensional component models in which the variation is approximated. These models are easier to interpret than the original data. Multilevel simultaneous component analysis (MSCA) is a method within the class of MCA models with increased interpretability, due to the fact that the time-resolved variation of all individuals is expressed in the same subspace.

MSCA is applied on a time-resolved metabolomics dataset. This dataset contains 1H NMR spectra of urine collected from 10 monkeys at 29 time-points during 2 months. The MSCA model contains a submodel describing the biorhythms in the urine composition and a submodel describing the variation between the animals. Using MSCA the largest biorhythms in the urine composition and the largest variation between the animals are identified.

Comparison of the MSCA model to a PCA model of this data shows that the MSCA model is better interpretable: the MSCA model gives a better view on the different types of variation in the data since they are not confounded.

Keywords:NMR   Principal component analysis   Urinalysis   Metabolomics   Biorhythms   Types of variation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号