首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Many experimental factors may have an impact on chemical or biological systems. A thorough investigation of the potential effects and interactions between the factors is made possible by rationally planning the trials using systematic procedures, i.e. design of experiments. However, assessing factors' influences remains often a challenging task when dealing with hundreds to thousands of correlated variables, whereas only a limited number of samples is available. In that context, most of the existing strategies involve the ANOVA-based partitioning of sources of variation and the separate analysis of ANOVA submatrices using multivariate methods, to account for both the intrinsic characteristics of the data and the study design. However, these approaches lack the ability to summarise the data using a single model and remain somewhat limited for detecting and interpreting subtle perturbations hidden in complex Omics datasets.  相似文献   

2.
Fourier transform infrared spectroscopy (FTIR) has been studied many times in the context of identification of plant, fungal and bacterial species. Infrared spectra are commonly analyzed using multivariate statistical methods such as cluster analysis (CA), principal component analysis (PCA), partial least squares analysis (PLS) and discriminant analysis (DA). In this study, a univariate statistical method for analysis of variance (ANOVA) was used to reduce the number of variables before applying the multivariate methods. Analyzing variables using ANOVA or a combination of ANOVA with CA produced better results. Here, experiments were carried out by performing ANOVA using the first derivative of the spectra instead of the original spectra or its second derivative because using the first‐derivative variables led to improved distinction between species. Different results were obtained by applying different validation methods. The leave‐one‐out validation method gave higher results than the validation‐with‐training and validation sample sets, thus indicating the non‐objectivity of the leave‐one‐out validation method. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
4.
NMR-based metabolomics is characterized by high throughput measurements of the signal intensities of complex mixtures of metabolites in biological samples by assaying, typically, bio-fluids or tissue homogenates. The ultimate goal is to obtain relevant biological information regarding the dissimilarity in patho-physiological conditions that the samples experience. For a long time now, this information has been obtained through the analysis of measured NMR signals via multivariate statistics.NMR data are quite complex and the use of such multivariate statistical methods as principal components analysis (PCA) for their analysis assumes that the data are multivariate normal with errors that are identical, independent and normally distributed (i.e. iid normal). There is a consensus that these assumptions are not always true for these data and, thus, several methods have been devised to transform the data or weight them prior to analysis by PCA. The structure of NMR measurement noise, or the extent to which violations of error homoscedasticity affect PCA results have neither been characterized nor investigated.A comprehensive characterization of measurement uncertainties in NMR based metabolomics was achieved in this work using an experiment designed to capture contributions of several sources of error to the total variance in the measurements. The noise structure was found to be heteroscedastic and highly correlated with spectral characteristics that are similar to the mean of the spectra and their standard deviation. A model was subsequently developed that potentially allows errors in NMR measurements to be accurately estimated without the need for extensive replication.  相似文献   

5.
The predominance of partial least squares-discriminant analysis (PLS-DA) used to analyze metabolomics datasets (indeed, it is the most well-known tool to perform classification and regression in metabolomics), can be said to have led to the point that not all researchers are fully aware of alternative multivariate classification algorithms. This may in part be due to the widespread availability of PLS-DA in most of the well-known statistical software packages, where its implementation is very easy if the default settings are used. In addition, one of the perceived advantages of PLS-DA is that it has the ability to analyze highly collinear and noisy data. Furthermore, the calibration model is known to provide a variety of useful statistics, such as prediction accuracy as well as scores and loadings plots. However, this method may provide misleading results, largely due to a lack of suitable statistical validation, when used by non-experts who are not aware of its potential limitations when used in conjunction with metabolomics. This tutorial review aims to provide an introductory overview to several straightforward statistical methods such as principal component-discriminant function analysis (PC-DFA), support vector machines (SVM) and random forests (RF), which could very easily be used either to augment PLS or as alternative supervised learning methods to PLS-DA. These methods can be said to be particularly appropriate for the analysis of large, highly-complex data sets which are common output(s) in metabolomics studies where the numbers of variables often far exceed the number of samples. In addition, these alternative techniques may be useful tools for generating parsimonious models through feature selection and data reduction, as well as providing more propitious results. We sincerely hope that the general reader is left with little doubt that there are several promising and readily available alternatives to PLS-DA, to analyze large and highly complex data sets.  相似文献   

6.
The metabolomics approach has proved to be promising in achieving non-targeted screening for those unknown and unexpected (U&U) contaminants in foods, but data analysis is often the bottleneck of the approach. In this study, a novel metabolomics analytical method via seeking marker compounds in 50 pharmaceutical and personal care products (PPCPs) as U&U contaminants spiked into lettuce and maize matrices was developed, based on ultrahigh-performance liquid chromatography-tandem mass spectrometer (UHPLC-MS/MS) output results. Three concentration groups (20, 50 and 100 ng mL−1) to simulate the control and experimental groups applied in the traditional metabolomics analysis were designed to discover marker compounds, for which multivariate and univariate analysis were adopted. In multivariate analysis, each concentration group showed obvious separation from other two groups in principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) plots, providing the possibility to discern marker compounds among groups. Parameters including S-plot, permutation test and variable importance in projection (VIP) in OPLS-DA were used for screening and identification of marker compounds, which further underwent pairwise t-test and fold change judgement for univariate analysis. The results indicate that marker compounds on behalf of 50 PPCPs were all discovered in two plant matrices, proving the excellent practicability of the metabolomics approach on non-targeted screening of various U&U PPCPs in plant-derived foods. The limits of detection (LODs) for 50 PPCPs were calculated to be 0.4~2.0 µg kg−1 and 0.3~2.1 µg kg−1 in lettuce and maize matrices, respectively.  相似文献   

7.
The genus Datura (Solanaceae) contains nine species of medicinal plants that have held both curative utility and cultural significance throughout history. This genus’ particular bioactivity results from the enormous diversity of alkaloids it contains, making it a valuable study organism for many disciplines. Although Datura contains mostly tropane alkaloids (such as hyoscyamine and scopolamine), indole, beta-carboline, and pyrrolidine alkaloids have also been identified. The tools available to explore specialized metabolism in plants have undergone remarkable advances over the past couple of decades and provide renewed opportunities for discoveries of new compounds and the genetic basis for their biosynthesis. This review provides a comprehensive overview of studies on the alkaloids of Datura that focuses on three questions: How do we find and identify alkaloids? Where do alkaloids come from? What factors affect their presence and abundance? We also address pitfalls and relevant questions applicable to natural products and metabolomics researchers. With both careful perspectives and new advances in instrumentation, the pace of alkaloid discovery—from not just Datura—has the potential to accelerate dramatically in the near future.  相似文献   

8.
Plant stress responses are mediated by the release of chemical compounds called exudates into the rhizosphere. These chemical substances include primary and secondary plant metabolites and play an important role in the plant defense mechanism. The identification, characterization and study of these compounds can open the door to numerous applications, from greener agriculture to enhanced phytoremediation. This paper critically reviews the most relevant sampling strategies, analytical methodologies, and data-mining approaches to study root exudates.Common analytical techniques are grounded in mass spectrometry or nuclear mass spectrometry, but less common biospectroscopy techniques could offer a new perspective in plant metabolomics due to the minimal sample processing they require. Finally, after analysis, the collected raw data must then be analyzed by means of different multivariate and univariate statistical approaches to test biological-response hypotheses. All in all, the assessment of root exudates calls for the development of hyphenated analytical methodologies, as well as efforts to consolidate data-preprocessing workflows.  相似文献   

9.
Many advanced metabolomics experiments currently lead to data where a large number of response variables were measured while one or several factors were changed. Often the number of response variables vastly exceeds the sample size and well-established techniques such as multivariate analysis of variance (MANOVA) cannot be used to analyze the data.  相似文献   

10.
Novel post‐genomics experiments such as metabolomics provide datasets that are highly multivariate and often reflect an underlying experimental design, developed with a specific experimental question in mind. ANOVA‐simultaneous component analysis (ASCA) can be used for the analysis of multivariate data obtained from an experimental design instead of the widely used principal component analysis (PCA). This increases the interpretability of the model in terms of the experimental question. Aside from the levels of individual factors, variation that can be described by the experimental design may also depend on levels of multiple (crossed) factors simultaneously, e.g. the interactions. ASCA describes each contribution with a PCA model, but a contribution depending on crossed factors may be described more parsimoniously by multiway models like parallel factor analysis (PARAFAC). The combination of PARAFAC and ASCA, named PARAFASCA, provides a view on the data that is both parsimonious and focused on the experimental question. The novel method is used to analyze a dataset in which the effect of two doses of hydrazine on the urinary chemical composition of rats is investigated by time‐resolved metabolic fingerprinting with nuclear magnetic resonance (NMR) spectroscopy. This experiment has been conducted to monitor the dose‐specific urine composition changes in time upon hydrazine administration. Comparison of the PCA, the ASCA and the PARAFASCA models shows that ASCA and PARAFASCA describe the data more dedicated to the experimental question than PCA, but that PARAFASCA is more parsimonious than ASCA, and separates the variation underlying different effects better. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

11.
The statistical design of experiments (DOE) is a collection of predetermined settings of the process variables of interest, which provides an efficient procedure for planning experiments. Experiments on biological processes typically produce long sequences of successive observations on each experimental unit (plant, animal, bioreactor, fermenter, or flask) in response to several treatments (combination of factors). Cell culture and other biotech-related experiments used to be performed by repeated-measures method of experimental design coupled with different levels of several process factors to investigate dynamic biological process. Data collected from this design can be analyzed by several kinds of general linear model (GLM) statistical methods such as multivariate analysis of variance (MANOVA), univariate ANOVA (timesplit-plot analysis with randomization restriction), and analysis of orthogonal polynomial contrasts of repeated factor (linear coefficient analysis). Last, regression model was introduced to describe responses over time to the different treatments along with model residual analysis. Statistical analysis of biprocess with repeated measurements can help investigate environmental factors and effects affecting physiological and bioprocesses in analyzing and optimizing biotechnology production.  相似文献   

12.
The statistical design of experiments (DOE) is a collection of predetermined settings of the process variables of interest, which provides an efficient procedure for planning experiments. Experiments on biological processes typically produce long sequences of successive observations on each experimental unit (plant, animal, bioreactor, fermenter, or flask) in response to several treatments (combination of factors). Cell culture and other biotech-related experiments used to be performed by repeated-measures method of experimental design coupled with different levels of several process factors to investigate dynamic biological process. Data collected from this design can be analyzed by several kinds of general linear model (GLM) statistical methods such as multivariate analysis of variance (MANOVA), univariate ANOVA (time split-plot analysis with randomization restriction), and analysis of orthogonal polynomial contrasts of repeated factor (linear coefficient analysis). Last, regression model was introduced to describe responses over time to the different treatments along with model residual analysis. Statistical analysis of biprocess with repeated measurements can help investigate environmental factors and effects affecting physiological and bioprocesses in analyzing and optimizing biotechnology production.  相似文献   

13.
Ultra-performance liquid chromatography/mass spectrometry-based metabolomics can been used for discovery of metabolite biomarkers to explore the metabolic pathway of diseases. Identification of metabolic pathways is key to understanding the pathogenesis and mechanism of disease. Myocardial dysfunction induced by sepsis (SMD) is a severe complication of septic shock and represents major causes of death in intensive care units; however its pathological mechanism is still not clear. In this study, ultrahigh-pressure liquid chromatography with mass spectrometry-based metabolomics with chemometrics anaylsis and multivariate pattern recognition analysis were used to detect urinary metabolic profile changes in a lipopolysaccharide-induced SMD mouse model. Multivariate statistical analysis including principal component analysis and orthogonapartial least squares discriminant analysis for the discrimination of SMD was conducted to identify potential biomarkers. A total of 19 differential metabolites were discovered by high-resolution mass spectrometry-based urinary metabolomics strategy. The altered biochemical pathways based on these metabolites showed that tyrosine metabolism, phenylalanine metabolism, ubiquinone biosynthesis and vitamin B6 metabolism were closely connected to the pathological processes of SMD. Consequently, integrated chemometric analyses of these metabolic pathways are necessary to extract information for the discovery of novel insights into the pathogenesis of disease.  相似文献   

14.
Soluble Mn(III)–L complexes appear to constitute a substantial portion of manganese (Mn) in many environments and serve as critical high-potential species for biogeochemical processes. However, the inherent reactivity and lability of these complexes—the same chemical characteristics that make them uniquely important in biogeochemistry—also make them incredibly difficult to measure. Here we present experimental results demonstrating the limits of common analytical methods used to quantify these complexes. The leucoberbelin-blue method is extremely useful for detecting many high-valent Mn species, but it is incompatible with the subset of Mn(III) complexes that rapidly decompose under low-pH conditions—a methodological requirement for the assay. The Cd-porphyrin method works well for measuring Mn(II) species, but it does not work for measuring Mn(III) species, because additional chemistry occurs that is inconsistent with the proposed reaction mechanism. In both cases, the behavior of Mn(III) species in these methods ultimately stems from inter- and intramolecular redox chemistry that curtails the use of these approaches as a reflection of ligand-binding strength. With growing appreciation for the importance of high-valent Mn species and their cycling in the environment, these results underscore the need for additional method development to enable quantifying such species rapidly and accurately in nature.  相似文献   

15.
Near-infrared (NIR) spectroscopy, in combination with chemometrics, enables nondestructive analysis of solid samples without time-consuming sample preparation methods. A new method for the nondestructive determination of compound amoxicillin powder drug via NIR spectroscopy combined with an improved neural network model based on principal component analysis (PCA) and radial basis function (RBF) neural networks is investigated. The PCA technique is applied to extraction relevant features from lots of spectra data in order to reduce the input variables of the RBF neural networks. Various optimum principal component analysis-radial basis function (PCA-RBF) network models based on conventional spectra and preprocessing spectra (standard normal variate (SNV) and multiplicative scatter correction (MSC)) have been established and compared. Principal component regression (PCR) and partial least squares (PLS) multivariate calibrations are also used, which are compared with PCA-RBF neural networks. Experiment results show that the proposed PCA-RBF method is more efficient than PCR and PLS multivariate calibrations. And the PCA-RBF approach with SNV preprocessing spectra is found to provide the best performance.  相似文献   

16.
When quantifying information in metabolomics, the results are often expressed as data carrying only relative information. Vectors of these data have positive components, and the only relevant information is contained in the ratios between their parts; such observations are called compositional data. The aim of the paper is to demonstrate how partial least squares discriminant analysis (PLS‐DA)—the most widely used method in chemometrics for multivariate classification—can be applied to compositional data. Theoretical arguments are provided, and data sets from metabolomics are investigated. The data are related to the diagnosis of inherited metabolic disorders (IMDs). The first example analyzes the significance of the corresponding regression parameters (metabolites) using a small data set resulting from targeted metabolomics, where just a subset of potential markers is selected. The second example—the approach of untargeted metabolomics—was used for the analysis detecting almost 500 metabolites. The significance of the metabolites is investigated by applying PLS‐DA, accommodated according to a compositional approach. The significance of important metabolites (markers of diseases) is more clearly visible with the compositional method in both examples. Also, cross‐validation methods lead to better results in case of using the compositional approach. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Multiway principal components analysis (MPCA) and parallel factor analysis (PARAFAC) are widely used in exploratory data analysis and multivariate statistical process control (MSPC). These models are linear in nature, thus, limited when non-linear relations are present in the data. Principal component analysis (PCA) can be extended to non-linear principal components analysis using autoassociative neural networks. In this paper, the network’s bottleneck layer outputs (non-linear components) were made orthogonal. A method to estimate confidence limits based on a kernel probability density function was proposed since these limits do not assume that the non-linear scores are normally distributed. A measure for the non-linear scores (DNL) was presented here to monitor on-line the process replacing the well known Hotelling’s T2 statistic. One hundred and two industrial fermentation runs were used to evaluate the performance of a non-linear technique for multivariate process statistical monitoring. Three process runs with faults were used to compare the error detection performance using a statistic for the non-linear scores and the residuals statistic (SPE).  相似文献   

18.
19.
In multivariate calibration with the spectral dataset, variable selection is often applied to identify relevant subset of variables, leading to improved prediction accuracy and easy interpretation of the selected fingerprint regions. Until now, numerous variable selection methods have been proposed, but a proper choice among them is not trivial. Furthermore, in many cases, a set of variables found by those methods might not be robust due to the irreproducibility and uncertainty issues, posing a great challenge in improving the reliability of the variable selection. In this study, the reproducibility of the 5 variable selection methods was investigated quantitatively for evaluating their performance. The reproducibility of variable selection was quantified by using Monte-Carlo sub-sampling (MCS) techniques together with the quantitative similarity measure designed for the highly collinear spectral dataset. The investigation of reproducibility and prediction accuracy of the several variable selection algorithms with two different near-infrared (NIR) datasets illustrated that the different variable selection methods exhibited wide variability in their performance, especially in their capabilities to identify the consistent subset of variables from the spectral datasets. Thus the thorough assessment of the reproducibility together with the predictive accuracy of the identified variables improved the statistical validity and confidence of the selection outcome, which cannot be addressed by the conventional evaluation schemes.  相似文献   

20.
Ultra high-performance liquid chromatography hyphenated to mass spectrometry (UHPLC-MS) technologies has been widely applied in metabolomics, and the high resolution and peak capacity thereof are only some of the key aspects that are exploited in such and related fields. In the current study, we investigated if low resolution chromatography, with the aid of multivariate data analyses, could be sufficient for a metabolic fingerprinting study that aims at discriminating between samples of different biological status or origin. UHPLC-MS data from chemically-treated Arabidopsis thaliana plants were used and chromatograms with different gradient lengths were compared. MarkerLynx? technology was employed for data mining, followed by principal component analysis (PCA) and orthogonal projections to latent structure discriminant analysis (OPLS-DA) as multivariate statistical interpretations. The results showed that, despite the congestion in low resolution chromatograms (of 5 and 10 min), samples could be classified based on their respective biological background in a similar manner as when using chromatograms with better resolution (of 20 and 40 min). This paper thus underlines that, in a metabolic fingerprinting study, low resolution chromatography together with multivariate data analyses suffice for biological classification of samples. The results also suggest that, depending on the initial objective of the undertaken study, optimisation in chromatographic resolution prior to full scale metabolomics studies is mandatory.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号