首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The work summarized in this paper presents the first part of a three‐paper series on robust partial least squares (RPLS) regression. Motivated by recent research activities in this area, this part provides a detailed algorithmic analysis of associated techniques, showing that existing work (i) may not represent a true robust formulation of partial least squares (PLS), (ii) may lead to convergence problems or (iii) may be insensitive to a certain type of outlier. On the basis of this analysis, Part I introduces a new conceptual RPLS algorithm that overcomes the deficiencies of existing work. The second part of this work details this new RPLS technique, compares its peformance with existing RPLS methods and provides an analysis on the computational efficiency and sensitivity of these algorithms. Whilst the first two parts of this work discuss algorithmic developments of RPLS, the final part concentrates on practical issues of RPLS implementations. This third part is devoted to practitioners of chemistry and chemical engineering covering a wide range of applications involving a calibration experiment, the analysis of recorded data from an industrial debutanizer process and data from a number of Raman spectroscopy experiments. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

2.
稳健偏最小二乘光度法同时测定贵金属元素锇和钌   总被引:3,自引:0,他引:3  
宋浩威  王洪艳 《分析化学》1996,24(10):1162-1165
本文将稳健偏最小二乘法用于贵金属元素锇、钌的光度法同时测定,较好地解决了实际校准模型由于实验误差偏离正态分布使计算结果的精度遭到破坏的问题。  相似文献   

3.
《Analytical letters》2012,45(1):171-183
Based on wavelet transformation (WT) and mutual information (MI), a simple and effective procedure is proposed for multivariate calibration of near-infrared spectroscopy. In such a procedure, the original spectra of the training set are first transformed into a set of wavelet representations by wavelet prism transform. Then, the MI value between each wavelet coefficient variable and the dependent variable is calculated, resulting in a MI spectrum; by retaining a subset set of coefficients with higher MI, an update training set consisting of wavelet coefficients is obtained and reconstructed/converted back to the original domain. Based on this, a partial least square (PLS) model can be constructed and optimized. The optimal wavelet and decomposition level are determined by experiment. A NIR quantitative problem involving the determination of total sugar in tobacco is used to demonstrate the overall performance of the proposed procedure, named RPLS, meaning PLS in reconstructed original domain coupled with MI-induced variable selection in wavelet domain (RPLS). Three kinds of procedures, that is, conventional full-spectrum PLS in original domain (FPLS), PLS in original domain coupled with MI-induced variable selection (OPLS), and direct PLS in MI-based wavelet coefficients (WPLS), are used as reference. The result confirms that it can build more accurate and robust calibration models without increasing the complexity.  相似文献   

4.
A Plackett‐Burman type dataset from a paper by Williams [1], with 28 observations and 24 two‐level factors, has become a standard dataset for illustrating construction (by halving) of supersaturated designs (SSDs) and for a corresponding data analysis. The aim here is to point out that for several reasons this is an unfortunate situation. The original paper by Williams contains several errors and misprints. Some are in the design matrix, which will here be reconstructed, but worse is an outlier in the response values, which can be observed when data are plotted against the dominating factor. In addition, the data should better be analysed on log‐scale than on original scale. The implications of the outlier for SSD analysis are drastic, and it will be concluded that the data should be used for this purpose only if the outlier is properly treated (omitted or modified). Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

5.
This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet–visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense.  相似文献   

6.
An algorithm has been developed for packing polypeptide chains by energy minimization subject to regularity conditions, in which regularity is maintained without the addition of pseudoenergy terms by defining the energy as a function of appropriately chosen independent variables. The gradient of the energy with respect to the independent variables is calculated analytically. The speed and efficiency of convergence of the algorithm to a local energy minimum are comparable to those of existing algorithms for minimizing the energy of a single polypeptide chain. The algorithm has been used to reinvestigate the minimum-energy regular structures of three-stranded (L -Ala)8, three-stranded (L -Val)6, five-stranded (L -Ile)6, and the regular and truncated three-stranded (Gly-L -Pro-L -Pro)4 triple helices. Local minima with improved packing energies, but with essentially unchanged geometrical properties, were obtained in all cases. The algorithm was also used to reinvestigate the structures proposed previously for the I and II forms of crystalline silk fibroin. The silk II structure was reproduced with slightly improved packing and little other change. The orthorhombic silk I structure showed more change and considerably improved packing energy, but the new regular monoclinic silk I structure had considerably higher energy. The results support the structure proposed previously for silk II and the orthorhombic structure, but not the monoclinic structure proposed for silk I. © 1994 by John Wiley & Sons, Inc.  相似文献   

7.
Datasets of molecular compounds often contain outliers, that is, compounds which are different from the rest of the dataset. Outliers, while often interesting may affect data interpretation, model generation, and decisions making, and therefore, should be removed from the dataset prior to modeling efforts. Here, we describe a new method for the iterative identification and removal of outliers based on a k‐nearest neighbors optimization algorithm. We demonstrate for three different datasets that the removal of outliers using the new algorithm provides filtered datasets which are better than those provided by four alternative outlier removal procedures as well as by random compound removal in two important aspects: (1) they better maintain the diversity of the parent datasets; (2) they give rise to quantitative structure activity relationship (QSAR) models with much better prediction statistics. The new algorithm is, therefore, suitable for the pretreatment of datasets prior to QSAR modeling. © 2014 Wiley Periodicals, Inc.  相似文献   

8.
The anisotropy of crystalline materials results in different physical and chemical properties on different facets, which warrants an in-depth investigation. Macroscopically facet-tuned, high-purity gallium nitride (GaN) single crystals were synthesised and machined, and the electrocatalytic hydrogen evolution reaction (HER) was used as the model reaction to show the differences among the facets. DFT calculations revealed that the Ga and N sites of GaN (100) had a considerably smaller ΔGH* value than those of the metal Ga site of GaN (001) or N site of GaN (00−1), thereby indicating that GaN (100) should be more catalytically active for the HER on account of its nonpolar facet. Subsequent experiments testified that the electrocatalytic performance of GaN (100) was considerably more efficient than that of other facets for both acidic and alkaline HERs. Moreover, the GaN crystal with a preferentially (100) active facet had an excellently durable alkaline electrocatalytic HER for more than 10 days. This work provides fundamental insights into the exploration of the intrinsic properties of materials and designing advanced materials for physicochemical applications.  相似文献   

9.
Ni Xin  Qinghua Meng  Yizhen Li  Yuzhu Hu 《中国化学》2011,29(11):2533-2540
This paper indicates the possibility to use near infrared (NIR) spectral similarity as a rapid method to estimate the quality of Flos Lonicerae. Variable selection together with modelling techniques is utilized to select representative variables that are used to calculate the similarity. NIR is used to build calibration models to predict the bacteriostatic activity of Flos Lonicerae. For the determination of the bacteriostatic activity, the in vitro experiment is used. Models are built for the Gram‐positive bacteria and also for the Gram‐negative bacteria. A genetic algorithm combined with partial least squares regression (GA‐PLS) is used to perform the calibration. The results of GA‐PLS models are compared to interval partial least squares (iPLS) models, full‐spectrum PLS and full‐spectrum principal component regression (PCR) models. Then, the variables in the two GA‐PLS models are combined and then used to calculate the NIR spectral similarity of samples. The similarity based on the characteristic variables and full spectrum is used for evaluating the fingerprints of Flos Lonicerae, respectively. The results show that the combination of variable selection method, modelling techniques and similarity analysis might be a powerful tool for quality control of traditional Chinese medicine (TCM).  相似文献   

10.
This paper introduces a class of methods to infer the relationship between observations and variables in latent subspace models. The approach is a modification of the recently proposed missing data methods for exploratory data analysis (MEDA). MEDA is useful to identify the structure in the data and also to interpret the contribution of each latent variable. In this paper, MEDA is augmented with dummy variables to find the data variables related to a given deviation detected among observations, for instance, the difference between one cluster of observations and the bulk of the data. The MEDA extension, referred to as observation‐based MEDA or oMEDA, can be performed in several ways, one of which is theoretically shown to be equivalent to a comparison of means between groups. The use of the proposed approach is demonstrated with a number of examples with simulated data and a real data set of archeological artifacts. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

11.
The non-linear regression technique known as alternating conditional expectations (ACE) method is only applicable when the number of objects available for calibration is considerably greater than the number of considered predictors. Alternating conditional expectations regression with selection of significant predictors by genetic algorithms (GA-ACE), the non-linear regression technique presented here, is based on the ACE algorithm but introducing several modifications to resolve the applicability limitations of the original ACE method, thus facilitating the practical implementation of a very interesting calibration tool. In order to overcome the lack of reliability displayed by the original ACE algorithm when working on data sets characterized by a too large number of variables and prior to the development of the non-linear regression model, GA-ACE applies genetic algorithms as a variable selection technique to select a reduced subset of significant predictors able to accurately model and predict a considered variable response. Furthermore, GA-ACE actually provides two alternative application approaches, since it allows either the performance of prior data compression computing a number of principal components to be subsequently subjected to GA-selection, or working directly on original variables.In this study, GA-ACE was applied to two real calibration problems, with a very low observation/variable ratio (NIR data), and the results were compared with those obtained by several linear regression techniques usually employed. When using the GA-ACE non-linear method, notably improved regression models were developed for the two response variables modeled, with root mean square errors of the residuals in external prediction (RMSEP) equal to 11.51 and 6.03% for moisture and lipid contents of roasted coffee samples, respectively. The improvement achieved by applying the new non-linear method introduced is even more remarkable taking into account the results obtained with the best performance linear method (IPW-PLS) applied to predict the studied responses (14.61 and 7.74% RMSEP, respectively).  相似文献   

12.
In this work, different approaches for variable selection are studied in the context of near-infrared (NIR) multivariate calibration of textile. First, a model-based regression method is proposed. It consists in genetic algorithm optimisation combined with partial least squares regression (GA-PLS). The second approach is a relevance measure of spectral variables based on mutual information (MI), which can be performed independently of any given regression model. As MI makes no assumption on the relationship between X and Y, non-linear methods such as feed-forward artificial neural network (ANN) are thus encouraged for modelling in a prediction context (MI-ANN). GA-PLS and MI-ANN models are developed for NIR quantitative prediction of cotton content in cotton-viscose textile samples. The results are compared to full-spectrum (480 variables) PLS model (FS-PLS). The model requires 11 latent variables and yielded a 3.74% RMS prediction error in the range 0-100%. GA-PLS provides more robust model based on 120 variables and slightly enhanced prediction performance (3.44% RMS error). Considering MI variable selection procedure, great improvement can be obtained as 12 variables only are retained. On the basis of these variables, a 12 inputs ANN model is trained and the corresponding prediction error is 3.43% RMS error.  相似文献   

13.
《Analytica chimica acta》2004,515(1):87-100
The goal of present work is to analyse the effect of having non-informative variables (NIV) in a data set when applying cluster analysis and to propose a method computationally capable of detecting and removing these variables. The method proposed is based on the use of a genetic algorithm to select those variables important to make the presence of groups in data clear. The procedure has been implemented to be used with k-means and using the cluster silhouettes as fitness function for the genetic algorithm.The main problem that can appear when applying the method to real data is the fact that, in general, we do not know a priori what the real cluster structure is (number and composition of the groups).The work explores the evolution of the silhouette values computed from the clusters built by using k-means when non-informative variables are added to the original data set in both a literature data set as well as some simulated data in higher dimension. The procedure has also been applied to real data sets.  相似文献   

14.
A new four‐way multiblock method is proposed to study the links between more than two sets of data tables (several multiblocks) measured on the same observations. This method, called STATIS‐4, generalizes the STATIS method to more than one set of matrices. In its first step, STATIS‐4 is searching for one consensus for each multiblock and a global consensus summarizing all the previous ones as good as possible. Some graphical representations can be made to visualize the proximities between the tables within a multiblock and to visualize those between all the multiblocks. Moreover, plots of the observations for each table, each multiblock and global observations can be made. The theory of STATIS‐4 and the algorithm used to obtain the optimal solutions are presented. Moreover, a real sensory dataset is studied with STATIS‐4. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

15.
The kinetic scheme of nylon-6 polymerization consists of ring opening, polyaddition, step growth, reaction with monofunctional acids and cyclization. A set of ordinary differential equations (initial value problem) governing the concentrations and moments of the reacting species and the energy balance for batch reactors has been solved. We proposed a semianalytic technique somewhat similar to the finite element method in which the conversion domain has been divided into sequential subdomains. A series solution for the state variables has been assumed in terms of the incremental conversion of caprolactam in that domain. The coefficients are obtained using the balance equations. This technique of solution takes care of the nonlinearity of the problem in a natural way and involves the sequential evaluation of constant coefficients of the series. It gives comparable results with those from Gear's algorithm (which involves the evaluation of functions) in far fewer steps. Our scheme can be easily implemented on a PC-XT and is considerably faster and more efficient.  相似文献   

16.
The partial least-squares (PLS) algorithm has become popular for explorative multivariate data analysis and for multivariate calibration. The same PLS algorithm can also be used for confirmatory data analysis. The discussion is limited to analysis of a single response variable. A close correspondence of PLS1 regression to classical analysis of variance (ANOVA) is demonstrated. The design of an experiment is described in terms of discrete design variables for main effects and simple interactions (dummy variables). These are used as regressors X = (x1, x2,…,) for modelling the response variable of the experiment, y. As in conventional use of PLS1 regression, the algorithm gives a concentrated model or diagram of the most important, y-relevant variability types in the X-data. In the present case, this gives the combination of design variables that models the variations in y. A simple plot of the resulting factor loadings immediately reveals the important design variables. Statistical tests and confidence regions in the PLS solution give additional safeguards against interpretation of spurious effects. The method is applied to two data sets. One concerns assessment of personal preference for blackcurrent juice, studied in a 25 factorial experiment; these data are also studied with missing values and as fractional factorials. The other ceoncers spectrophotometric absorbance-based colour assessments of pigment in strawberry jam in a 3-factor design with 2, 2 and 3 levels in the respective factors.  相似文献   

17.
This work investigates the ability of multiplicative (on the basis of product units) and sigmoidal neural models built by an evolutionary algorithm to quantify highly overlapping chromatographic peaks. To test this approach, two N-methylcarbamate pesticides, carbofuran and propoxur, were quantified using a classic peroxyoxalate chemiluminescence reaction as a detection system for chromatographic analysis. The four-parameter Weibull curve associated with the profile of the chromatographic peak estimated by the Levenberg-Marquardt method was used as input data for both models. Straightforward network topologies (one output) allowed the analytes to be quantified with great accuracy and precision. Product unit neural networks provided better information ability, smaller network architectures, and more robust models (smaller standard deviation). The reduced dimensions of the selected models enabled the derivation of simple quantification equations to transform the input variables into the output variable. These equations can be more easily interpreted from a chemical point of view than those provided by sigmoidal neural networks, and the effect of both analytes on the characteristics of chromatographic bands, namely profile, dispersion, peak height, and residence time, can be readily established.  相似文献   

18.
19.
20.
This investigation describes an optically transparent antistatic film composed of antimony-doped tin oxide (ATO) nanoparticles dispersed in a polymer matrix, with remarkably improved electrical and optical properties. The film is fabricated on the basis of a synergistic interaction between self-assembling nanoparticles and self-organizing matrix materials. The antistatic property of the film is obtained at ATO concentrations above a threshold value. A scaling analysis of the data yields an extremely low critical concentration (0.0020 volume fraction), which is considerably lower than the value predicted by percolation theory. Microscopic observations of the film have revealed a characteristic microstructure: "single-stranded" chainlike (linear form or fibrous) aggregates consisting of ATO nanoparticles and large ATO-depleted areas. The experiment results suggest that the high optical transparency and the low critical concentration are derived from the characteristic microstructures of the film.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号