首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The integration of multiple data sources has emerged as a pivotal aspect to assess complex systems comprehensively. This new paradigm requires the ability to separate common and redundant from specific and complementary information during the joint analysis of several data blocks. However, inherent problems encountered when analysing single tables are amplified with the generation of multiblock datasets. Finding the relationships between data layers of increasing complexity constitutes therefore a challenging task. In the present work, an algorithm is proposed for the supervised analysis of multiblock data structures. It associates the advantages of interpretability from the orthogonal partial least squares (OPLS) framework and the ability of common component and specific weights analysis (CCSWA) to weight each data table individually in order to grasp its specificities and handle efficiently the different sources of Y-orthogonal variation.  相似文献   

2.
More than one multi-informative analytical technique is often applied when describing the condition of a set of samples. Often a part of the information found in these data blocks is redundant and can be extracted from more blocks. This study puts forward a method (multiblock variance partitioning—MVP) to compare the information/variation in different data blocks using simple quantitative measures. These measures are the unique part of the variation only found in one data block and the common part that can be found in more data blocks. These different parts are found using PLS models between predictor blocks and a common response. MVP provides a different view on the information in different blocks than normal multiblock analysis. It will be shown that this has many applications in very diverse fields such as process control, assessor performance in sensory analysis, efficiency of preprocessing methods and as complementary information to an interval PLS analysis. Here the ideas of the MVP approach are presented in detail using a study of red wines from different regions measured with GC-MS and FT-IR instruments providing different kinds of data representations.  相似文献   

3.
Many experimental factors may have an impact on chemical or biological systems. A thorough investigation of the potential effects and interactions between the factors is made possible by rationally planning the trials using systematic procedures, i.e. design of experiments. However, assessing factors' influences remains often a challenging task when dealing with hundreds to thousands of correlated variables, whereas only a limited number of samples is available. In that context, most of the existing strategies involve the ANOVA-based partitioning of sources of variation and the separate analysis of ANOVA submatrices using multivariate methods, to account for both the intrinsic characteristics of the data and the study design. However, these approaches lack the ability to summarise the data using a single model and remain somewhat limited for detecting and interpreting subtle perturbations hidden in complex Omics datasets.  相似文献   

4.
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases.  相似文献   

5.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
ComDim analysis was designed to assess the relationships between individuals and variables within a multiblock setting where several variables, organized in blocks, are measured on the same individuals. An overview of this method is presented together with some of its properties. Furthermore, we discuss a new extension of the method of analysis to the case of (K+1) datasets. More precisely, the aim is to explore the relationships between a response dataset and K other datasets. An illustration of this latter strategy of analysis on the basis of a case study involving Time Domain ‐ Nuclear Magnetic Resonance data is outlined and the outcomes are compared with those of Multiblock Partial Least Squares regression.  相似文献   

7.
In this paper, multivariate calibration of complicated process fluorescence data is presented. Two data sets related to the production of white sugar are investigated. The first data set comprises 106 observations and 571 spectral variables, and the second data set 268 observations and 3997 spectral variables. In both applications, a single response, ash content, is modelled and predicted as a function of the spectral variables. Both data sets contain certain features making multivariate calibration efforts non-trivial. The objective is to show how principal component analysis (PCA) and partial least squares (PLS) regression can be used to overview the data sets and to establish predictively sound regression models. It is shown how a recently developed technique for signal filtering, orthogonal signal correction (OSC), can be applied in multivariate calibration to enhance predictive power. In addition, signal compression is tested on the larger data set using wavelet analysis. It is demonstrated that a compression down to 4% of the original matrix size — in the variable direction — is possible without loss of predictive power. It is concluded that the combination of OSC for pre-processing and wavelet analysis for compression of spectral data is promising for future use.  相似文献   

8.
When quantifying information in metabolomics, the results are often expressed as data carrying only relative information. Vectors of these data have positive components, and the only relevant information is contained in the ratios between their parts; such observations are called compositional data. The aim of the paper is to demonstrate how partial least squares discriminant analysis (PLS‐DA)—the most widely used method in chemometrics for multivariate classification—can be applied to compositional data. Theoretical arguments are provided, and data sets from metabolomics are investigated. The data are related to the diagnosis of inherited metabolic disorders (IMDs). The first example analyzes the significance of the corresponding regression parameters (metabolites) using a small data set resulting from targeted metabolomics, where just a subset of potential markers is selected. The second example—the approach of untargeted metabolomics—was used for the analysis detecting almost 500 metabolites. The significance of the metabolites is investigated by applying PLS‐DA, accommodated according to a compositional approach. The significance of important metabolites (markers of diseases) is more clearly visible with the compositional method in both examples. Also, cross‐validation methods lead to better results in case of using the compositional approach. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
Several approaches of investigation of the relationships between two datasets where the individuals are structured into groups are discussed. These strategies fit within the framework of partial least squares (PLS) regression. Each strategy of analysis is introduced on the basis of a maximization criterion, which involves the covariances between components associated with the groups of individuals in each dataset. Thereafter, algorithms are proposed to solve these maximization problems. The strategies of analysis can be considered as extensions of multi‐group principal components analysis to the context of PLS regression. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
This work presents a novel method for simultaneous spectrophotometric determination of phosphate and silicate by using a cross injection analysis (CIA) coupled with the use of partial least squares (PLS) for data evaluation. The detection principle is based on the well-known ‘molybdenum blue’ method. The molybdate ions in the presence of stannous chloride in acidic medium give phosphomolybdenum blue and silicomolybdenum blue as products. In this work, all the liquids, including sample and reagents were simultaneously introduced into a CIA platform by using two peristaltic pumps for controlling the x-channel and y-channel flow which was automatically manipulated by using in-house control board. Crossflow provides sufficient mixing inside the platform prior detection of the absorption spectra of blue complexes in the wavelength of 400–900 nm. Since spectra of the blue colour product of phosphate and silicate are resemblant, these two analytes therefore reciprocally interfere with one another. This results in difficulty in simultaneous analysis of phosphate and silicate. In this work, PLS was utilised as assistor of CIA system for simultaneous analysis of phosphate and silicate using molybdenum blue reaction without using any modification of reagents and addition of selective masking agent. The calibration ranges are 0.1–6 mgP L?1 and 5–100 mgSi L?1 for phosphate and silicate, respectively. By using CIA coupled with PLS for data evaluation, the analysis of two analytes was achieved within 1.5 min with only single injection. The developed system was applied to natural water samples and the system was validated with the conventional methods. By statistical paired t-test, there was no evidence of significant difference at 95% confidence level (tstat = 2.28, tcritical = 2.31 and tstat = 0.62, tcritical = 2.31 for phosphate and silicate, respectively). This implied that the chemometrics-assisted CIA system was successfully developed for simultaneous spectrophotometric determination of phosphate and silicate.  相似文献   

11.
The complexity of metabolic profiles makes chemometric tools indispensable for extracting the most significant information. Partial least‐squares discriminant analysis (PLS‐DA) acts as one of the most effective strategies for data analysis in metabonomics. However, its actual efficacy in metabonomics is often weakened by the high similarity of metabolic profiles, which contain excessive variables. To rectify this situation, particle swarm optimization (PSO) was introduced to improve PLS‐DA by simultaneously selecting the optimal sample and variable subsets, the appropriate variable weights, and the best number of latent variables (SVWL) in PLS‐DA, forming a new algorithm named PSO‐SVWL‐PLSDA. Combined with 1H nuclear magnetic resonance‐based metabonomics, PSO‐SVWL‐PLSDA was applied to recognize the patients with lung cancer from the healthy controls. PLS‐DA was also investigated as a comparison. Relatively to the recognition rates of 86% and 65%, which were yielded by PLS‐DA, respectively, for the training and test sets, those of 98.3% and 90% were offered by PSO‐SVWL‐PLSDA. Moreover, several most discriminative metabolites were identified by PSO‐SVWL‐PLSDA to aid the diagnosis of lung cancer, including lactate, glucose (α‐glucose and β‐glucose), threonine, valine, taurine, trimethylamine, glutamine, glycoprotein, proline, and lipid. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
Dimension reduction is a crucial technique in machine learning and data mining, which is widely used in areas of medicine, bioinformatics and genetics. In this paper, we propose a two-stage local dimension reduction approach for classification on microarray data. In first stage, a new L1-regularized feature selection method is defined to remove irrelevant and redundant features and to select the important features (biomarkers). In the next stage, PLS-based feature extraction is implemented on the selected features to extract synthesis features that best reflect discriminating characteristics for classification. The suitability of the proposal is demonstrated in an empirical study done with ten widely used microarray datasets, and the results show its effectiveness and competitiveness compared with four state-of-the-art methods. The experimental results on St Jude dataset shows that our method can be effectively applied to microarray data analysis for subtype prediction and the discovery of gene coexpression.  相似文献   

13.
Multivariate chemical data often contain elements that are missing completely at random and the so-called left-censored elements whose values are only known to be below a definite threshold value (reporting limit). In the last several years, attention has been paid to developing methods for dealing with data containing missing elements and those that can handle data with missing elements and outliers. However, processing data with both missing and left-censored elements is still an ongoing problem.  相似文献   

14.
Raman chemical imaging provides chemical and spatial information about pharmaceutical drug product. By using resolution methods on acquired spectra, the objective is to calculate pure spectra and distribution maps of image compounds. With multivariate curve resolution-alternating least squares, constraints are used to improve the performance of the resolution and to decrease the ambiguity linked to the final solution. Non negativity and spatial local rank constraints have been identified as the most powerful constraints to be used.  相似文献   

15.
As for the detection of drug body packing, skin is a typical interference factor. In this paper, multivariate data analysis was proposed to analyze the impact of fat and muscle on heroin identification based on the profile of spectra of energy dispersive X‐ray diffraction. In the space of principal components, the results showed that different pure samples (heroin, muscle, fat) clustered in different areas, whereas the location of mixture samples moved between locations of pure samples. The impact of fat and muscle lies in moving the feature points between pure materials in the space of principal components. Furthermore, the model of heroin covered by fat and muscle of different thicknesses was set up, and a linear relationship was proven to be suitable. Our findings indicate that multivariate data analysis would be a promising method in the detection of drug body packing. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

16.
Simultaneous quantification of brimonidine tartrate (BRI) and timolol maleate (TIM) in an eye drop formulation was performed by applying parallel factor analysis (PARAFAC) and trilinear (three way) partial least squares to the ultra performance liquid chromatography-photodiode array (UPLC-PDA) data array. In PARAFAC and 3 W-PLS1 applications, the co-elution of the related compounds in their chromatograms obtained in the presence of ornidazole as an internal standard (IS) was resolved, and then analyses were performed. On the other hand, a new conventional ultra performance liquid chromatography (UPLC) method was developed after long and tedious studies to get desirable elution of BRI and TIM in a chromatogram using different column and mobile phase system than that of chromatographic conditions of PARAFAC and 3 W-PLS1 applications. The performance and validity of all the proposed methods were confirmed by analyzing independent validation samples consisting of synthetic mixture, intraday and interday samples, and standard addition samples. Analysis results of BRI and TIM in eye drop samples by chemometric PARAFAC and 3 W-PLS1, and conventional UPLC were statistically compared to each other. It was concluded that PARAFAC and 3 W-PLS1 have shortest analysis time and lower cost than the developed conventional UPLC method for the analysis of the related compounds in commercial eye drop preparation with adequate selectivity and sensitivity.  相似文献   

17.
紫外光谱法对维生素E油酸酯、维生素E与油酸的同时测定   总被引:1,自引:0,他引:1  
建立了混合体系中维生素E、油酸和维生素E油酸酯同时测定的方法,用光纤光谱仪获取混合体系紫外-可见透射光谱.实验按均匀设计建立校正集和预测集,在255 ~315 nm波段采用偏最小二乘法建立了同时定量测定该3组分的校正模型,并用间隔区间偏最小二乘法(iPLS)通过优选建模区间改进油酸的预测模型.采用iPLS能够显著提高模型准确度,尤其对光谱弱响应的物质,最大相对误差从PLS直接建模的54.7%降至iPLS的8.98%,建立的模型可满足动力学研究的原位分析需要.  相似文献   

18.
This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.

An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.

Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.

There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables.  相似文献   


19.
An optimized model of multivariate classification for the monitoring of eighteen spring waters in the land of Serra St. Bruno, Calabria, Italy, has been developed. Thirty analytical parameters for each water source were investigated and reduced to eight by means of Principal Component Analysis (PCA). Water springs were grouped in five distinct classes by cluster techniques (CA) and a model for their classification was built by a Partial Least Squares–Discriminant Analysis (PLS–DA) procedure. The model was optimized and validated and then applied to new data matrices, containing the analytical parameters carried out on the same sources during the successive years. This model proved to be able to notice deviations of the global analytical characteristics, by pointing out in the course of time a different distribution of the samples within the classes. The variation of nitrate concentration was demonstrated to be the major responsible for the observed class shifts. The shifting sources were localized in areas used as sowable lands and high variability of nitrate content was ascribed to the practice of crop rotation, involving a varying use of the nitrogenous chemical fertilizers.  相似文献   

20.
An evaluation of computational performance and precision regarding the cross‐validation error of five partial least squares (PLS) algorithms (NIPALS, modified NIPALS, Kernel, SIMPLS and bidiagonal PLS), available and widely used in the literature, is presented. When dealing with large data sets, computational time is an important issue, mainly in cross‐validation and variable selection. In the present paper, the PLS algorithms are compared in terms of the run time and the relative error in the precision obtained when performing leave‐one‐out cross‐validation using simulated and real data sets. The simulated data sets were investigated through factorial and Latin square experimental designs. The evaluations were based on the number of rows, the number of columns and the number of latent variables. With respect to their performance, the results for both simulated and real data sets have shown that the differences in run time are statistically different. PLS bidiagonal is the fastest algorithm, followed by Kernel and SIMPLS. Regarding cross‐validation error, all algorithms showed similar results. However, in some situations as, for example, when many latent variables were in question, discrepancies were observed, especially with respect to SIMPLS. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号