首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.  相似文献   

2.
3.
Metabolomics is used to reduce the complexity of plants and to understand the underlying pathways of the plant phenotype. The metabolic profile of plants can be obtained by mass spectrometry or liquid-state NMR. The extraction of metabolites from the sample is necessary for both techniques to obtain the metabolic profile. This extraction step can be eliminated by making use of high-resolution magic angle spinning (HR-MAS) NMR. In this review, an HR-MAS NMR-based workflow is described in more detail, including used pulse sequences in metabolomics. The pre-processing steps of one-dimensional HR-MAS NMR spectra are presented, including spectral alignment, baseline correction, bucketing, normalisation and scaling procedures. We also highlight some of the models which can be used to perform multivariate analysis on the HR-MAS NMR spectra. Finally, applications of HR-MAS NMR in plant metabolomics are described and show that HR-MAS NMR is a powerful tool for plant metabolomics studies.  相似文献   

4.
In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.  相似文献   

5.
Metabolomics studies aim at a better understanding of biochemical processes by studying relations between metabolites and between metabolites and other types of information (e.g., sensory and phenotypic features). The objectives of these studies are diverse, but the types of data generated and the methods for extracting information from the data and analysing the data are similar. Besides instrumental analysis tools, various data-analysis tools are needed to extract this relevant information. The entire data-processing workflow is complex and has many steps. For a comprehensive overview, we cover the entire workflow of metabolomics studies, starting from experimental design and sample-size determination to tools that can aid in biological interpretation. We include illustrative examples and discuss the problems that have to be dealt with in data analysis in metabolomics. We also discuss where the challenges are for developing new methods and tailor-made quantitative strategies.  相似文献   

6.
Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.  相似文献   

7.
The predominance of partial least squares-discriminant analysis (PLS-DA) used to analyze metabolomics datasets (indeed, it is the most well-known tool to perform classification and regression in metabolomics), can be said to have led to the point that not all researchers are fully aware of alternative multivariate classification algorithms. This may in part be due to the widespread availability of PLS-DA in most of the well-known statistical software packages, where its implementation is very easy if the default settings are used. In addition, one of the perceived advantages of PLS-DA is that it has the ability to analyze highly collinear and noisy data. Furthermore, the calibration model is known to provide a variety of useful statistics, such as prediction accuracy as well as scores and loadings plots. However, this method may provide misleading results, largely due to a lack of suitable statistical validation, when used by non-experts who are not aware of its potential limitations when used in conjunction with metabolomics. This tutorial review aims to provide an introductory overview to several straightforward statistical methods such as principal component-discriminant function analysis (PC-DFA), support vector machines (SVM) and random forests (RF), which could very easily be used either to augment PLS or as alternative supervised learning methods to PLS-DA. These methods can be said to be particularly appropriate for the analysis of large, highly-complex data sets which are common output(s) in metabolomics studies where the numbers of variables often far exceed the number of samples. In addition, these alternative techniques may be useful tools for generating parsimonious models through feature selection and data reduction, as well as providing more propitious results. We sincerely hope that the general reader is left with little doubt that there are several promising and readily available alternatives to PLS-DA, to analyze large and highly complex data sets.  相似文献   

8.
The scale at which MS‐ and NMR‐based platforms generate metabolomics datasets for both research, core, and clinical facilities to address challenges in the various sciences—ranging from biomedical to agricultural—is underappreciated. Thus, metabolomics efforts spanning microbe, environment, plant, animal, and human systems have led to continual and concomitant growth of in silico resources for analysis and interpretation of these datasets. These software tools, resources, and databases drive the field forward to help keep pace with the amount of data being generated and the sophisticated and diverse analytical platforms that are being used to generate these metabolomics datasets. To address challenges in data preprocessing, metabolite annotation, statistical interrogation, visualization, interpretation, and integration, the metabolomics and informatics research community comes up with hundreds of tools every year. The purpose of the present review is to provide a brief and useful summary of more than 95 metabolomics tools, software, and databases that were either developed or significantly improved during 2017–2018. We hope to see this review help readers, developers, and researchers to obtain informed access to these thorough lists of resources for further improvisation, implementation, and application in due course of time.  相似文献   

9.
在核磁共振代谢组学数据预处理中,尺度归一化主要目的是提高特征代谢物信息的权重,减小噪声及无关代谢物信息的影响,从而降低后续模式识别分析的难度. 本文提出一种新的尺度归一化方法,该方法不强调各变量在尺度上的归一,而是在原始数据的基础上,通过提高那些稳定性高、且在不同类别样本中具有显著差异性的变量的权重,以增强与特征代谢物相关的信息. 文中分别采用模拟数据和真实代谢组学数据对新归一化方法的性能进行评估,并与单位方差法(Unit Variance)、变量稳定性(Variable Stability)和尺度缩放法(Level Scaling)等常用的尺度归一化方法做比较. 研究结果表明:新归一化方法能够提高多变量统计模型的预测能力,较好地保留核磁共振谱的分子信息,有助于特征代谢物的识别,并使后续的数据分析结果具有更好的可解释性.  相似文献   

10.
Metabolomics datasets generated by modern analytical instruments tend to be increasingly complex. In this study, a recent method named shrunken centroids regularized discriminant analysis (SCRDA) has been introduced and applied in the exploration of metabolomics dataset. It is a supervised method for variable selection, discriminant analysis and biomarker screening. By regularizing the estimate of the within‐class covariance matrix, SCRDA can deal with the singularity issue of linear discriminant analysis. Then a shrinkage estimator is applied to perform variable selection. The method presented is illustrated through the simulated datasets and three complex metabolomics datasets. Commonly used orthogonal partial least squares discriminant analysis and two other similar statistical methods, penalized linear discriminant analysis and nearest shrunken centroids, are used for comparisons. The results illustrate that SCRDA has some desirable abilities in variable selection, classification and prediction. Moreover, the biomarkers identified by SCRDA are further demonstrated to be in accordance with the biochemical research. It has been proved that SCRDA can be applied as a promising strategy in metabolomics. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
Metabolomics aims to better understand biological systems through the chemical analysis of an organism's metabolic profile. One common method of analysis is mass spectrometry preceded by chromatographic separations. Samples produced by metabolomics investigations can contain hundreds to thousands of compounds, which can put great strain on the instrumental analysis. In order to improve these analyses, the data analysis must not be overlooked. The ever‐evolving field of chemometrics provides many useful tools for the analysis of chromatographic data. These include methods for preprocessing data to extract a maximum amount of information from the data as well as pattern recognition in order to find the compounds that vary the most in relation to the perturbation to the biological system under study. This article aims to highlight and provide future outlooks on current chemometric methods for chromatographic‐based metabolomics investigations. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
13.
When quantifying information in metabolomics, the results are often expressed as data carrying only relative information. Vectors of these data have positive components, and the only relevant information is contained in the ratios between their parts; such observations are called compositional data. The aim of the paper is to demonstrate how partial least squares discriminant analysis (PLS‐DA)—the most widely used method in chemometrics for multivariate classification—can be applied to compositional data. Theoretical arguments are provided, and data sets from metabolomics are investigated. The data are related to the diagnosis of inherited metabolic disorders (IMDs). The first example analyzes the significance of the corresponding regression parameters (metabolites) using a small data set resulting from targeted metabolomics, where just a subset of potential markers is selected. The second example—the approach of untargeted metabolomics—was used for the analysis detecting almost 500 metabolites. The significance of the metabolites is investigated by applying PLS‐DA, accommodated according to a compositional approach. The significance of important metabolites (markers of diseases) is more clearly visible with the compositional method in both examples. Also, cross‐validation methods lead to better results in case of using the compositional approach. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

14.
The use of chemometric methods based on the analysis of variances (ANOVA) allows evaluation of the statistical significance of the experimental factors used in a study. However, classical multivariate ANOVA (MANOVA) has a number of requirements that make it impractical for dealing with metabolomics data. For this reason, in recent years, different options have appeared that overcome these limitations. In this work, we evaluate the performance of three of these multivariate ANOVA-based methods (ANOVA simultaneous component analysis—ASCA, regularized MANOVA–rMANOVA, and Group-wise ANOVA-simultaneous component analysis—GASCA) in the framework of metabolomics studies. Our main goals are to compare these various ANOVA-based approaches and evaluate their performance on experimentally designed metabolomic studies to find the significant factors and identify the most relevant variables (potential markers) from the obtained results. Two experimental data sets were generated employing liquid chromatography coupled to mass spectrometry (LC-MS) with different complexity in the design to evaluate the performance of the statistical approaches. Results show that the three considered ANOVA-based methods have a similar performance in detecting statistically significant factors. However, relevant variables pointed by GASCA seem to be more reliable as there is a strong similarity with those variables detected by the widely used partial least squares discriminant analysis (PLS-DA) method.  相似文献   

15.
NMR-based metabolomics is characterized by high throughput measurements of the signal intensities of complex mixtures of metabolites in biological samples by assaying, typically, bio-fluids or tissue homogenates. The ultimate goal is to obtain relevant biological information regarding the dissimilarity in patho-physiological conditions that the samples experience. For a long time now, this information has been obtained through the analysis of measured NMR signals via multivariate statistics.NMR data are quite complex and the use of such multivariate statistical methods as principal components analysis (PCA) for their analysis assumes that the data are multivariate normal with errors that are identical, independent and normally distributed (i.e. iid normal). There is a consensus that these assumptions are not always true for these data and, thus, several methods have been devised to transform the data or weight them prior to analysis by PCA. The structure of NMR measurement noise, or the extent to which violations of error homoscedasticity affect PCA results have neither been characterized nor investigated.A comprehensive characterization of measurement uncertainties in NMR based metabolomics was achieved in this work using an experiment designed to capture contributions of several sources of error to the total variance in the measurements. The noise structure was found to be heteroscedastic and highly correlated with spectral characteristics that are similar to the mean of the spectra and their standard deviation. A model was subsequently developed that potentially allows errors in NMR measurements to be accurately estimated without the need for extensive replication.  相似文献   

16.
针对代谢组学研究中的数据处理问题,本研究建立了基于质谱的数据分析系统MS-IAS(Mass spectrometry based integrated analysis system).此系统集成了特征选择、聚类、分类等多种方法,用以处理质谱数据,具有多种统计分析方法能对所选的特征变量进行比较,以发现与所研究问题相关的潜在生物标志物.MS-IAS支持数据与多种算法结果可图形化显示,有助于对数据的解释与分析.以肝病患者的质谱代谢组数据为例,展示MS-IAS的功能,两种特征选择算法从数据集中筛选出了40个对肝病具有区分能力的特征变量,展示了MS-IAS成为代谢组学研究中的通用质谱数据分析系统的潜力.  相似文献   

17.
An investigation on the influence of pre-processing on the recognition of chemically similar areas in a spectral image, using simulated data. Fictitious spectra of mixtures of five components at varying concentrations were corrupted by different types of noise to mimic typical signals from Raman imaging. They were then processed by various combinations of pre-processing functions, including baseline correction, smoothing, normalization and Principal Components (PC) compression, and by two clustering algorithms (k-means and agglomerative hierarchical clustering) to recognize the original mixtures. The clusters obtained by the different pre-processing combinations and distance metrics were evaluated by statistical parameters (Rand index and silhouette coefficient) and visual inspection. Perhaps the best performing on the basis of all considered criteria is the combination using an adaptive polynomial detrending, a slight smoothing, normalization by the total signal intensity and compression by 4 PCs (spanning 80% of the total variance). More detailed analysis was also carried out on subsets of the whole data with a particular type of noise and on the influence of each single pre-processing/clustering variable.  相似文献   

18.
Cao DS  Wang B  Zeng MM  Liang YZ  Xu QS  Zhang LX  Li HD  Hu QN 《The Analyst》2011,136(5):947-954
Large amounts of data from high-throughput metabolomics experiments have become commonly more and more complex, which brings a number of challenges to existing statistical modeling. Thus there is a need to develop a statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In this work, we provide a new strategy based on Monte Carlo cross validation coupled with the classification tree algorithm, which is termed as the MCTree approach. The MCTree approach inherently provides a feasible way to uncover the predictive structure of metabolomics data by the establishment of many cross-predictive models. With the help of the sample proximity matrix such obtained, it seems to be able to give some interesting insights into metabolomics data. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by means of variable importance ranking in the MCTree approach. Two real metabolomics datasets are finally used to demonstrate the performance of the proposed approach.  相似文献   

19.
《Electrophoresis》2018,39(7):909-923
Rapid advances in mass spectrometry (MS) and nuclear magnetic resonance (NMR)‐based platforms for metabolomics have led to an upsurge of data every single year. Newer high‐throughput platforms, hyphenated technologies, miniaturization, and tool kits in data acquisition efforts in metabolomics have led to additional challenges in metabolomics data pre‐processing, analysis, interpretation, and integration. Thanks to the informatics, statistics, and computational community, new resources continue to develop for metabolomics researchers. The purpose of this review is to provide a summary of the metabolomics tools, software, and databases that were developed or improved during 2016–2017, thus, enabling readers, developers, and researchers access to a succinct but thorough list of resources for further improvisation, implementation, and application in due course of time.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号